···
Log in / Register

Senior Data Engineer (Data Scraping), Madrid

Indeed
Full-time
Onsite
No experience limit
No degree limit
Puerta del Sol, 4, Centro, 28013 Madrid, Spain
Favourites
Share
Some content was automatically translatedView Original

Description

Job Summary: We are seeking a Senior Data Scraping Analysis Specialist with Python experience to build intelligent crawling pipelines and perform large-scale data extraction within AWS ecosystems, connecting external data sources with internal systems and AI agents. Key Highlights: 1. Building intelligent crawling pipelines on AWS 2. Mastery of classical and AI-driven scraping techniques 3. Collaboration with Data Science, AI, and Backend teams Senior Data Engineer (Data Scraping) We seek a Senior Data Scraping Analysis Specialist with strong Python expertise who wishes to advance their career by building high-performance intelligent crawling and large-scale data extraction pipelines deployed in AWS ecosystems. CONTEXT AND RESPONSIBILITIES The selected candidate will join the Functional Team with the critical mission of connecting external information sources to internal analytics systems and new cloud-based AI agents. The role involves designing and maintaining advanced scraping and crawling pipelines capable of operating at scale in AWS environments, ensuring resilience, traceability, observability, and compliance with security standards. Proficiency in classical scraping techniques (Playwright, Selenium, BeautifulSoup) is essential, alongside emerging AI-driven solutions such as Firecrawl, Crawl4AI, or LLM agents capable of automating navigation and content extraction from dynamic and highly protected websites. The specialist must also process and transform large volumes of data within cloud-native architectures, integrating results into the organization’s analytical systems. PROJECT AND TEAM This project aims to fully automate external data acquisition and make it available in AWS to feed analytical platforms and Generative AI models. This includes developing intelligent crawlers, anti-bot strategies, proxy rotation, and structuring unstructured data into formats optimized for subsequent consumption. The selected candidate will work closely with Data Scientists, AI Engineers, and Backend teams under the supervision of the Product Manager and in alignment with architectural guidelines defined for AWS environments. The ecosystem integrates services such as Lambda, ECS, S3, Step Functions, and distributed databases; thus, the ability to design cloud-native pipelines will be key to success in this role. EXPERIENCE AND KNOWLEDGE We seek a candidate with at least 4 years of experience in advanced scraping and data analysis, and deep specialization in Python applied to large-scale crawling and web automation. Experience building distributed scrapers on AWS and recent exposure to AI-driven scraping technologies will be especially valued. **Required experience includes:** * Core Scraping & Crawling: \- Playwright, Selenium, BeautifulSoup, Requests / aiohttp * Firecrawl, Crawl4AI, Browserless, or LLM agents for intelligent crawling * Anti-bot strategies, proxy rotation, and browser fingerprinting * Data Engineering Processing: \- Python (Pandas, Polars, PySpark) * ETL/ELT pipelines, normalization and cleaning of large-scale data * Advanced parsing (HTML, JSON, XML, structured and unstructured documents) * AWS Infrastructure (mandatory): \- S3, Lambda, ECS/ECR, Step Functions * CloudWatch (crawler monitoring), IAM (permission segmentation) * SQS/SNS (orchestration and communication) * AWS Glue or EMR (desirable) * Databases: \- PostgreSQL, MySQL, MongoDB, or DynamoDB * Data integration and storage model design for high-volume scenarios Additionally, the following experience or knowledge will be positively considered: * Orchestration: Airflow, Prefect, or Dagster * Serverless infrastructure and containers optimized for crawling * Data integration with LLMs, RAG pipelines, or intelligent agents * Data visualization or exploratory data analysis * Design of highly concurrent distributed pipelines HIRING AND LOCATION This position is based in Madrid and governed by a full-time contract with long-term stability. Given the project’s criticality and the need for close collaboration with business and technical teams, the role requires on-site presence at the offices (operating under a hybrid model, typically 3 days on-site and 2 days remote). Playwright, Selenium, BeautifulSoup, Firecrawl, Crawl4AI

Source:  indeed View original post
David Muñoz
Indeed · HR

Company

Indeed
David Muñoz
Indeed · HR
Cookie
Cookie Settings
Our Apps
Download
Download on the
APP Store
Download
Get it on
Google Play
© 2025 Servanan International Pte. Ltd.