Project Overview
This project implements a sophisticated automated web scraper for Jumia, Morocco's leading e-commerce platform. Built with Selenium WebDriver and Python, it extracts structured product data from multiple categories with precise pricing analysis, smart discount calculations, and intelligent update handling. The solution features robust popup management, efficient pagination navigation, and seamless MongoDB integration for scalable data storage and analysis.
Extracted Data Fields
product_title
Complete product name extracted from listing pages
product_url
Direct link to the product detail page for reference
current_price
Current listed price with currency normalization
old_price
Original price before discount application (if available)
discount_percentage
Calculated percentage discount for promotional analysis
discount_quantity
Absolute discount value in local currency
inserted_at
Timestamp of initial data insertion for tracking
updated_at
Timestamp of most recent update for freshness tracking
published_at
Boolean field for custom publication workflow logic
Core Features & Implementation
Smart Browser Automation
Selenium WebDriver with configurable Firefox profiles, headless mode support, and optimized performance settings
Popup Management
Intelligent detection and handling of promotional popups and overlay advertisements
Efficient Navigation
Smart category traversal and paginated result processing with configurable depth limits
Robust Price Parsing
Advanced regex-based price extraction handling multiple currency formats and localization
Data Validation
Comprehensive deduplication logic and data integrity checks with MongoDB queries
Price Update Tracking
Automatic price change detection with historical preservation and timestamp management
Modular Architecture
Clean separation of concerns: configuration, scraping logic, parsing utilities, and database operations
MongoDB Integration
Cloud-ready database storage with MongoDB Atlas, optimized queries, and efficient data operations
Scraping Results & Performance
The scraper efficiently targets high-traffic categories including Phones & Tablets and Electronics, processing up to 5 pages per category. The system demonstrates excellent performance with minimal redundancy and robust update handling.
Technical Skills Demonstrated
MongoDB Atlas Integration

mongodb+srv://username:password@cluster.mongodb.net/
Collection: jumia.products
Operations: Insert, Update, Query, Deduplication