Leonardo Vida

Full Stack Data Engineer biased on building products that use (a lot of) data

Amsterdam, the Netherlands, CET

LV

About

As a Data Engineer, I single-handedly built data platforms for enterprise clients. Recently, I led a team of 5 for ~6 months in a big corporate, successfully delivering the product we were building and aim to going back doing so in the future. Currently, I work as consultant for companies in Netherlands and I mostly use Python, HCL and TypeScript.

Work Experience

DBC

Dec 2023 - Present

Senior Data Engineer Consultant

  • For client A, deployed data platform using IaC on Azure (Databricks, dbt) to centralize data of growing number of subsidiaries
  • For client A, created custom API wrappers to ELT across 10+ sources alongside custom libraries for pipelines and data tests
  • For client B, developed and productionized advanced custom RAG solution over 1M+ files on Azure used by 700+ employees
  • For client B, deployed self-hosted LLM reducing cost by 40X, trace monitoring and system telemetry (Langsmith, Azure)
  • For client C, deployed data platform using IaC on AWS (Airbyte, Databricks and Dagster), provided infra to backend team
  • Developed internal terraform blueprints and modules for a range of data platform solutions and client sizes
  • Led the development of internal project for a co-pilot for recruitment, currently being tested at a partner company
  • Mentored junior team members, led trainings and clients' workshops on the application and productionization of LLMs
  • Prima assicurazioni
    Remote

    Jun 2023 - Nov 2023

    Senior Data Engineer

  • Supported the development of self-service data platform based on data mesh principles on AWS (dbt, Airbyte, Argo)
  • Refactored old in-house config-driven ETL package and developed supporting libraries for self-service data platform
  • Engineered petabyte-scale ETL processes in PySpark focusing on data quality and data pipeline efficiency
  • Implemented agile product management, boosting efficiency and team morale
  • Defined data product, contract and permission specifications across the company and supported team's roadmap definition
  • Brenntag

    Oct 2022 - May 2023

    Senior Data Engineer

  • Led core Data Engineering team with a total of 6 developers in newly created Data department
  • Architected and developed AWS-native data platform with focus on security and data quality (Glue, Iceberg, MWAA)
  • Developed all core pipelines for EMEA and NA subsidiaries and main libraries for data processing and quality monitoring
  • Deployed SLAs monitoring for critical data sets, ensuring high data availability and integrity
  • Managed team planning and rituals, and translated business needs into technical requirements prioritizing them
  • Beerwulf / Heineken

    Aug 2021 - Sep 2022

    Data Engineer

  • Transitioned all core ETL pipelines from batch to micro-batch structured streaming and then DLT on Databricks
  • Refactored and enhanced data observability library and introduced automated data tests across all medallion layers
  • Deployed MLOps platform on MLFlow and integrated into data infrastructure new B2B2C marketplace and D2C product
  • Created ML models to forecast churn, LTV and predict demand, improving demand forecasting accuracy by 40\%
  • Utrecht University

    Aug 2020 - Jul 2021

    Research Engineer

  • Led research project engineering, fine-tuned transformer models with \$100k+ GCP grant
  • Developed custom pipeline to extract entire collection of the Dutch national library, process, OCR and score texts
  • Developed back-end security of OSS to automate systematic reviews (asreview), with more than 150,000 downloads on PyPi
  • Developed OSS spatial data package (osmenrich) in R for sensitive data enrichment
  • Education

    Utrecht University

    2020 - 2022
    M.Sc. in Computational Science: Applied Data Science; GPA: 8.1/10 (Cum laude)

    Maastricht University

    2014 - 2017
    B.Sc. in Economics: International Economics; GPA: 8.2/10.0

    Skills

    Python
    TypeScript
    Next.js
    SQL
    Terraform HCL/Docker/Kubernetes
    GraphQL
    R
    Langchain, llamaindex, open source LLMs

    Press J to open the command menu