ML/AI Engineer (LLM Production)
Our client is a leading organization committed to innovation within the financial services sector. They foster a forward-thinking culture that emphasizes collaboration, continuous learning, and cutting-edge technology adoption. This role offers a unique opportunity to work on a high-impact, production-level large language model (LLM) project in a dynamic environment that values technical expertise and innovative problem-solving.
This position involves working on advanced AI implementations, including model fine-tuning, deployment, and optimization. You will play a key role in shaping the AI capabilities of a mission-critical system and directly contribute to its success. The role is perfect for a detail-oriented engineer who enjoys hands-on work and working closely with a cross-functional team to deliver high-quality solutions.
Role Overview:
The ML/AI Engineer (LLM Production) will be responsible for developing, fine-tuning, and deploying large language models within our client's financial data environment. This role involves managing the full lifecycle of AI development—from data preparation and model training to production deployment and optimization—ensuring performance targets and SLAs are met. You will collaborate with senior engineers, learn best practices for enterprise deployment, and contribute to a live, impactful AI system.
Key Skills & Experience:
• Hands-on experience with LLM fine-tuning, RAG pipelines, or model serving
• Strong proficiency in Python
• Experience with deep learning frameworks such as PyTorch or TensorFlow
• Knowledge of relevant libraries like Hugging Face Transformers, LangChain, LlamaIndex, or vLLM
• Ability to read and write technical documentation in English
• Experience with AI tools like Claude Code, GPT-codex for rapid development and PoC testing
Key Responsibilities:
• Fine-tune open-weight language models on domain-specific data using techniques such as SFT and Quantization
• Build and optimize Retrieval-Augmented Generation (RAG) pipelines with vector databases and policy document ingestion
• Serve models efficiently using production inference engines, applying quantization and batching to meet latency and throughput SLAs
• Manage model deployment pipelines across various environments (DEV, UAT, PRE-PRD, PRD) on cloud infrastructure such as IBM watsonx.ai and OpenShift
• Use modern LLMs for code generation, prompt engineering, and pipeline optimization
• Evaluate emerging AI models and inference frameworks, producing validation reports and technical documentation
Requirements:
• Degree in Computer Science, Data Science, AI, or a related field
• Experience working on LLM fine-tuning, RAG pipelines, or model serving
• Strong Python skills
• Knowledge of machine learning fundamentals and applicable frameworks (PyTorch, TensorFlow)
• Familiarity with relevant libraries such as Hugging Face Transformers, LangChain, or vLLM
• Ability to read and produce technical documentation in English
• Eligible to work on an on-site basis in accordance with the assignment's location requirements
Nice to Have:
• Experience with LoRA, QLoRA, Unsloth, DPO, or RLHF fine-tuning techniques
• Knowledge of quantization methods (INT4, INT8) and inference optimization
• Experience with vector databases (Milvus, pgvector)
• Familiarity with IBM watsonx.ai, OpenShift, or Kubernetes
• Exposure to multilingual NLP, especially CJK datasets
• Prior experience in financial services or regulated industries
This is a permanent role offering flexible working hours and a hybrid work environment. The position is contracted on an Asia Contract basis, starting from 21/04/2026, with a monthly pay rate. The role provides an excellent opportunity to develop skills in cutting-edge AI and to contribute to impactful projects in the financial sector.
Candidates who meet these criteria are encouraged to apply to join a team dedicated to innovation and technical excellence.
