Welcome, I’m
Vincent Martenot
Lead AI & data science
Over the last 9 years, I’ve worked for a variety of companies to help them make the most of their data and integrate AI into their processes

About me
Dynamic and passionate data scientist with a strong appetite for complex technical problems and cutting-edge technologies. I bring 9 years of experience leading high-stakes projects across multiple domains (healthcare, energy, banking, marketing), with a significant tenure in MLops and model interpretability, both in consulting firms and with end clients.
My technical expertise is enhanced by strong communication and management skills. I’m committed to fostering team growth in a supportive environment that encourages progress and quality work
My passion for challenges led me to undertake the ambitious goal of completing my first degree at École Polytechnique alongside a double degree in fluid physics, taught in Chinese, in Beijing in 2015. This was also an opportunity to gain experience in academic research in partnership with EDF R&D department.
My professional interests include artificial intelligence, machine learning, software development, as well as business organization and strategy.
Skills
Selected projects

NLP | HEALTHCARE
Literature Search Automation
This project deploys a cloud-based app that automates the detection of drug safety signals from adverse events in medical literature. Powered by three fine-tuned NLP models, the app enhances pharmacovigilance tasks by efficiently identifying potential safety concerns, streamlining experts’ daily monitoring and analysis efforts.
Technical stack: SQL, Kubeflow, EC2, S3, RDS, Docker
Achievements: Enhanced safety signal coverage by a factor of 3 (publication BMC – 2021)

NLP | FINANCE
News matching | Sentiment analysis
The project aimed to develop an application capable of gathering newspaper articles and Google News content, accurately linking the mentioned companies to their corresponding SIREN numbers, and evaluating their risk of imminent failure. The company database, comprising 20 million entities, was sourced from INSEE public data.
Technical stack: MongoDB, TF-IDF, BERT, AWS, Docker
Achievements: 85% F1-score in the top 3 matching results. Greatly improves the freshness of information provided to users, and increases their reactivity.

NLP | HEALTHCARE
Medical report structuration in cancerology
Build a pipeline to automatize the extraction of complex medical information from patient reports in French into database for multiple use cases in research, clinical studies, patient care (OCR > pseudonymisation > cleaning > formatting > extraction > standardisation > coherence check)
Technical stack: BERT, LLM, AWS, docker, React
Achievements: Created the first french language model for biomedical text – ALIBERT (publication ACL – 2021)

TABULAR DATA | HEALTHCARE
Patient journey clustering
Group sequences of patient journey with deep learning algorithm. The aim is to identify different pathways of disease based on patient diagnosis and records. The results will be interpreted to help clinicians design new drugs that target specific disease pathways.
Technical stack: Auto-encoder, python, umap, SPADE
Achievements: Created a new clustering methodology converted into technical assets for the company

TABULAR | BANKING
Bank customer database structuration
Redesigned and adapted the structure of a customer database for retail banking.
Technical stack: SQL, powerBI

NLP | HEALTHCARE
Illegal medicinal product websites detection
The project goal was to detect and classify web sites offering illegal drugs or medical devices for sale.
Technical stack: LLMs, Apify, Google SerpAPI, Azure

TABULAR DATA | MARKETING
Customer targeting
Identify how and when contact customers for aftersales (annual revision) in automotive industry. Build scores and churn, products appentency, upgrades, etc…
Stack technique: SQL, python, scikit-learn
Achievements: Contributed to the deployment of the system on 7 countries in Europe

TABULAR DATA | MARKETING
Mix Marketing Modeling
Explain sales with marketing investments and calculate ROI of different channels (TV, Radio, Internet, Leaflets, etc.)
Technical stack: Python, Scikit-learn

TIME SERIES | FINANCE
Stock picking recommandation
As a personal project, I designed and implemented an algorithm for automatically selecting the best CAC40 stocks based on their fundamentals over a 6-month.
Technical stack: Python, Scikit-learn, Streamlit
Achievements: 70% accuracy (currently using it for my personal finance)

TABULAR DATA | ENERGY
Anomaly detection
Project aim was to detect anomalies in public lights facilities failure based on data collected from a computer-assisted maintenance system. The objective was to reduce the amount of currative lamp and electric cabinets replacements by workers to reduce maintenance costs.
Technical stack: Python, Scikit-learn, Tableau Software