RAG-Powered Enterprise Knowledge Assistant
Built an intelligent Q&A system using Retrieval-Augmented Generation (RAG) over 50,000+ internal documents. Integrated LangChain, FAISS vector store, and GPT-4/Claude APIs to provide instant, sourced answers to employee queries, reducing support ticket volume by 40%.
LangChainFAISSFastAPIPythonDocker
40% reduction in support tickets
AI-Powered Insurance Policy Analyzer
Developed a GenAI system for Prudential that automatically analyzes insurance policies, extracts key terms, and generates risk summaries. Used LLMs with structured output parsing and prompt engineering to automate what previously required hours of manual review per policy.
LangChainOpenAIPydanticStreamlitPostgreSQL
85% faster policy analysis
Multi-Agent AI Workflow for Market Research
Designed a multi-agent system using CrewAI and LangGraph where specialized AI agents collaborate to perform market research: one scrapes data, another analyzes competitors, and a third generates executive reports. Reduced research cycle from 2 weeks to 2 days.
CrewAILangGraphPythonQdrantGradio
85% faster research cycle
Custom LLM Fine-Tuning for Financial Compliance
Fine-tuned open-source LLMs (LLaMA, Mistral) on financial regulation datasets for a fintech client. The specialized model automatically classifies transactions, flags compliance issues, and generates audit-ready explanations, achieving 96% accuracy on regulatory queries.
Hugging FaceLoRAPyTorchAWS SageMakerMLflow
96% regulatory query accuracy
ML-Based Delivery Time Estimator
Developed an ensemble ML model (XGBoost + LightGBM) at Olist that predicted delivery times with 33% more accuracy than the previous rule-based system. Integrated geospatial features, carrier performance data, and seasonal patterns. Reduced customer complaints by 19%.
XGBoostLightGBMScikit-learnFlaskDocker
33% accuracy improvement, -19% complaints
Real-Time Delivery Risk Prediction System
Built a real-time risk scoring engine at Delivery Center that monitors active deliveries and predicts failure probability. The system triggers proactive interventions (driver reassignment, customer notifications) when risk thresholds are exceeded, preventing delivery failures before they happen.
PythonScikit-learnFastAPIRedisPostgreSQL
Proactive failure prevention
Time-Series Demand Forecasting & Pricing Engine
Created a demand forecasting system combining ARIMA, Prophet, and gradient boosting models at Delivery Center. Paired with a margin-aware optimization layer that dynamically adjusts pricing based on demand, capacity, and cost signals, driving 7% revenue growth.
ProphetXGBoostOptunaAirflowDatabricks
7% revenue growth
Customer Churn Prediction & Retention System
Developed a churn prediction model for a retail client that identifies at-risk customers 30 days before they churn. Combined behavioral features, RFM analysis, and survival models to generate risk scores and personalized retention strategies, reducing churn by 18%.
LightGBMScikit-learnPySparkAirflowStreamlit
18% churn reduction
Genetic Programming for Cancer Prediction
Capstone research project at UFPR developing a novel genetic programming-based clustering algorithm for gene expression analysis. The algorithm discovers patterns in high-dimensional biological data to assist in cancer type classification and prediction.
PythonEvolutionary AlgorithmsNumPyScikit-learn
Novel research contribution
Absenteeism & Turnover Prediction Models
Built predictive ML models (via Facilia / triggo.ai, client: Guima Conseco) to anticipate employee absenteeism and voluntary turnover. Combined HR data, operational signals, and behavioral features to generate risk scores that feed into retention and workforce-planning workflows.
PythonLightGBMScikit-learnPandasDatabricks
Proactive HR decision-making
NLP Product Categorization Engine
Built an NLP-based automatic product categorization system at Olist that analyzes product titles, descriptions, and attributes to classify items into the correct taxonomy. Reduced manual input by 65% and improved catalog quality, directly impacting customer satisfaction.
spaCyTF-IDFScikit-learnFastAPIPostgreSQL
65% reduction in manual classification
Customer Review Sentiment Analysis Pipeline
Developed an end-to-end sentiment analysis pipeline that processes thousands of customer reviews daily, extracting sentiment, key topics, and actionable insights. Used transformer-based models (BERT) fine-tuned on domain-specific data to achieve 92% accuracy on Portuguese text.
Hugging FaceBERTPyTorchAirflowStreamlit
92% sentiment accuracy
Intelligent Document Processing (IDP) System
Built an IDP system that extracts structured data from unstructured documents (invoices, contracts, reports) using OCR + NLP + LLMs. The system handles multiple document formats, validates extracted fields, and integrates with client ERP systems via API.
TesseractLangChainspaCyFastAPIDocker
90% automation rate
LLM-Powered Intelligent Web Scraper
Developed an adaptive web scraping system that uses LLMs to understand page structure and extract data without brittle CSS selectors. The AI agent navigates dynamic pages, handles pagination, CAPTCHAs, and anti-bot measures, and outputs clean structured data in any schema.
LangChainPlaywrightOpenAIPythonMongoDB
95% extraction accuracy on dynamic sites
E-commerce Competitive Price Monitor
Built a scalable price monitoring system that scrapes 100,000+ products daily across major e-commerce platforms. Features include price change alerts, historical price tracking, competitor analysis dashboards, and automated repricing recommendations based on market positioning.
ScrapySeleniumPostgreSQLAirflowStreamlit
100K+ products monitored daily
Real Estate Market Intelligence Platform
Created an automated data collection platform that scrapes property listings, rental prices, and market trends from multiple real estate portals. Includes geocoding, deduplication with fuzzy matching, and a dashboard for market analysis and investment decision support.
ScrapyBeautifulSoupPostgreSQLPlotlyDocker
500K+ listings tracked
B2B Lead Generation & Enrichment Engine
Built a lead generation system that scrapes business directories, LinkedIn profiles, and company websites to build enriched prospect databases. Uses NLP to classify industry, company size, and technologies used, enabling highly targeted outreach for sales teams.
PlaywrightspaCyPythonMongoDBFastAPI
10K+ enriched leads/month
Big Data Pipeline Optimization (6,000x Speedup)
Re-architected and optimized data pipelines at ALLOS using PySpark, SQL, and ML techniques. Transformed batch processing jobs that took hours into near-real-time pipelines, achieving a 6,000x performance improvement. Enabled data freshness for executive dashboards.
PySparkDatabricksSQLDelta LakeAirflow
6,000x processing speedup
Regulatory-Grade Data Infrastructure (Fintech)
Designed and implemented the complete data infrastructure at Swap covering ingestion, transformation, storage, and governance. Built to Central Bank of Brazil standards with 100% transaction traceability, audit trails, and data quality monitoring, critical for Payment Institution certification.
PythonAirflowPostgreSQLDockerAWS
100% compliance achieved
Real-Time ETL Pipeline with Event Streaming
Built a real-time data ingestion and transformation pipeline that processes millions of events daily from multiple sources. Uses event-driven architecture with message queues for reliable delivery, schema evolution support, and automated data quality checks.
Apache KafkaPySparkAirflowPostgreSQLDocker
Millions of events/day processed
Dimensional Data Warehouse (Gold Layer)
Designed and implemented the dimensional Gold layer of the data warehouse (via Facilia / triggo.ai, client: Guima Conseco). Modeled facts and dimensions aligned with business processes, powering KPIs, operational metrics, and executive dashboards, and feeding downstream predictive models.
SQLPySparkDatabricksStar SchemaAirflow
Unified business metrics layer
Automated Customer Onboarding System
Designed and implemented an intelligent onboarding automation system at Olist that streamlined the seller registration process. Automated document validation, data verification, and account setup workflows, reducing completion time by 67% (from 3 weeks to 1 week).
PythonAirflowPostgreSQLREST APIsDocker
67% faster onboarding
Financial Report Generation Automation
Built an end-to-end automated reporting system that pulls data from multiple sources, runs validation checks, applies business rules, and generates formatted financial reports. Includes anomaly detection that flags unusual values for human review before distribution.
PythonPandasAirflowJinja2Slack API
90% reduction in manual reporting
Transaction Reconciliation Automation
Developed an automated reconciliation system at Swap that matches financial transactions across multiple payment processors, banks, and internal systems. Handles edge cases like partial payments, refunds, and chargebacks with 100% traceability for audit compliance.
PythonPySparkPostgreSQLAirflowFastAPI
100% transaction traceability
LLM-Driven Healthcare Data Pipeline
Designed an end-to-end automated pipeline (via Facilia / triggo.ai, client: Funcional Health) for ingesting, cleaning, and storing healthcare establishment data from CNES/DataSUS into a data-warehouse-like layer. An LLM-controlled web scraping crawler enriches information on 200K+ establishments, replacing a previously manual curation process.
PythonLangChainPlaywrightAirflowPostgreSQL
200K+ establishments automated
Investment Funds Workflow RPA (VBA + RPA)
Automated day-to-day operational workflows in HSBC's investment funds department using VBA-based solutions and Robotic Process Automation (RPA). Significantly reduced execution time and operational errors in fund-related processes, while supporting the department's Business Continuity Plan (BCP).
VBARPAExcelMacros
Reduced execution time & errors
Retail Backoffice RPA Automation
Implemented Robotic Process Automation (RPA) routines in the Backoffice department of a major retail consulting client at Logic Information Systems. The automation reduced execution time and operational errors in planning and forecasting workflows powered by Oracle RPAS.
RPAOracle RPASVBASQL
Faster backoffice, fewer errors
Last-Mile Delivery Route Optimization
Built an intelligent route optimization engine at Delivery Center combining ML predictions with bio-inspired algorithms (genetic algorithms, ant colony optimization). The system considers real-time traffic, driver capacity, delivery windows, and cost constraints to generate optimal routes.
PythonOR-ToolsGenetic AlgorithmsFastAPIRedis
13% cost reduction, +22% satisfaction
Dynamic Pricing & Margin Optimization
Created a margin-aware pricing optimization system that balances revenue maximization with competitive positioning. Uses demand elasticity models, competitor pricing signals, and cost structures to recommend optimal price points in real-time, contributing to 7% revenue growth.
PythonOptunaXGBoostFastAPIDatabricks
7% revenue increase
Executive KPI Dashboard Suite
Designed and implemented a comprehensive KPI dashboard suite in Databricks at ALLOS covering sales performance, customer engagement, and operational efficiency. Features drill-down capabilities, automated alerts, and natural language summaries for C-level stakeholders.
DatabricksSQLPlotlyPythonPySpark
Data-driven executive decisions
Fuzzy Matching for Loyalty Program Mapping
Built a fuzzy matching algorithm at ALLOS that maps stores to benefits in a loyalty program with 98% accuracy. Handles name variations, abbreviations, and typos across thousands of store entries, enabling automated benefit assignment that previously required manual curation.
PythonFuzzyWuzzyPySparkDatabricksSQL
98% matching accuracy
Data Lake Architecture & Governance Framework
Architected and implemented a data lake at Prudential do Brasil with proper governance layers: data cataloging, quality monitoring, access controls, and lineage tracking. Enabled the organization to transition from siloed spreadsheets to a unified, trusted data platform.
AWS S3GlueAthenaPythonAirflow
Unified data platform established