
Hello, I'm
About me
I began my journey in the ever so small country of Luxembourg & moved to Portugal at the age of 17 to pursue my studies. Ever since, carved out a niche for myself in the world of data, balancing the trio roles of a data engineer, scientist & genAI engineer on Databricks environment.
I've always been a guy of odd interests. When I was younger I was really passionate about martial arts and computers. This inherently led me to building my own computer and joining my first boxing gym.
Fast forward some years and I box on an amateur level with aspirations of becoming national champion in 2025, and also ... play chess. Chess is my go-to chill activity. It's kind of like the calm, strategic counterpoint to the adrenaline rush of boxing. Some people find it odd, I find it complementary.
Something that also defines me is my passion for meeting new & different people. Probably because I was born in Luxembourg and always surrounded by different cultures and languages. Regardless, this type of passion has made me pretty open-minded and really incited my passion for travelling.
Reach out!
Favorite projects👨💻👑
LLM RAG Application
built GenAI LLM Retrieval-Augmented Generation (RAG) application on Databricks: integrated SharePoint documents powering a conversational AI assistant
- Pyspark
- Databricks
- LangGraph
- CI/CD
- MLflow
- DBX Vector Search
- DBX Model Serving
Thesis on CNNs for Melanoma Classification
Built a CNN pipeline on Azure for melanoma detection using transfer learning and advanced data augmentation, with real-time SMS prognostics via Twilio.
- Python
- TensorFlow
- Keras
- Azure ML
- CNN
- Twilio
Databricks Server Comparator
Implemented a scalable Databricks-based comparator using PySpark and Spark SQL for PK-level, cross-table row validation—detecting mismatches in Delta Lake tables at scale.
- Python
- PySpark
- Spark SQL
- Databricks
- ETL
My skills
- Python
- R
- SQL & Spark SQL
- PySpark
- Databricks
- Azure
- Azure Machine learning Studio
- CI/CD - Azure Devops
- Delta Lake/Azure Data Lake Storage
- ETL/ELT pipelines
- Data modeling & warehousing
- Databricks Medallion Architecture
- Machine learning frameworks (TensorFlow, PyTorch, Keras, scikit-learn)
- Deep Learning
- Hugging Face Transformers
- langchain
- langgraph
- LLMOps (vector search, model serving, MLflow)
My experience
GenAI Engineer
NOKIA
Led end-to-end development of a production-grade GenAI/LLM Retrieval-Augmented Generation (RAG) application on Databricks, integrating Nokia SharePoint supply-chain documents and metadata to power a conversational AI assistant for self-service reporting.
Data Scientist/Engineer
NOKIA
In charge of creating & enhancing complex supply chain data objects that can be used by business to facilitate end-to-end reporting, transforming their business requests to data solutions. Utilizing PySpark for automation & data manipulations & ETL in Databricks environment.
Data Science Trainee in Rotational Program
NOKIA - Financial Planning and Reporting Analytics
Lead in optimizing SAP report generation by recreating complex reports using advanced SQL queries on a centralized data platform with ingested SAP tables, significantly reducing processing time. Python scripts for automated validation of reports.
Data Science Trainee in Rotational Program
NOKIA - Data Semantics Team
Contributed on data migration tasks from on-premises server to Azure cloud by doing User Acceptance Testing (UAT) with SQL. Lead in developing a Python-based Excel File Comparator program, ensuring data consistency between on-premise servers and Azure cloud by precisely locating and reporting any discrepancies.
Data Science Trainee in Rotational Program
NOKIA - Process Mining Team
Automating daily data extraction and API updates from Celonis API using Azure Function Apps. Automated extraction, processing, and ingestion of Celonis audit logs API data into data pools using Python. Developed star schema, associated tables (SQL), & dashboard. Classification project using Celonis Machine Learning Workbench: Predict discount classification: Descriptive analysis using Matplotlib, coded using skicit-learn
Data Science Trainee in Rotational Program
NOKIA - Supply Chain Advanced Analytics Team
Involved in classification project using Azure Cloud Machine Learning Studio to accurately predict on-time delivery of supply chain products. Migrated data pipelines from on-premises databases to Azure SQL Database, leveraging Azure Data Factory.
Master of Analytics Specialization in Data Science
Católica Lisbon SBE
I graduated with a data science applied to business masters, where I truly fell passionate to the arts of Data Science
Bachelors in Economics
Nova Lisbon SBE
I graduated with an Economics bachelors
PC Assembly
Lisbon, Portugal
I built my own computer from scratch for fun!
Howdy partner! I'm vi, your rootin'-tootin' AI Agent that knows all about Miguel. Ask me anything about this cowboy! ��


