Xihang Dai
AI Engineer — RAG pipelines and LLM-powered agent systems
Education
Beijing University of Posts and Telecommunications
09/2019 — 06/2023Bachelor of Engineering, Telecommunications Engineering and Management
Experience
China Unicom Global LimitedAssistant AI Engineer
09/2023 — Present- AI engineer specialising in LLM-driven agent architectures and knowledge-centric systems, with proven experience in enterprise-scale deployment, evaluation, and heterogeneous accelerator environments.
- Took primary responsibility for end-to-end on-premises LLM deployment in isolated enterprise environments using Docker on GPU/NPU infrastructure; optimised inference performance for high-concurrency workloads and developed a comprehensive RAG evaluation framework.
- Experienced with LangGraph and Dify for agent development; working knowledge of OpenClaw for agent runtime orchestration. Hands-on with state management, short-/long-term memory design, and multi-turn context maintenance.
- Strong interest in long-term planning and self-evolving agent systems, with the goal of leveraging isolated-environment deployment experience to explore federated learning and privacy-enhanced techniques for agentic workflows.
Projects
Intelligent Question Answering PlatformEnterprise Knowledge Platform
09/2023 — Present- Designed and implemented end-to-end Python pipelines for ingesting, preprocessing, and indexing multi-format knowledge sources (documents, structured/semi-structured data), using Surya OCR for scanned and image-based content to build a unified enterprise knowledge base.
- Built an LLM-based RAG backend using FAISS as the vector store, BGE embeddings for retrieval, BGE reranker for re-ranking, and Qwen2.5-72B for answer generation, significantly improving answer accuracy.
- Developed source traceability and fault-tolerance mechanisms, enhancing system trustworthiness, stability, and usability.
- Deployed Docker-based containerised model-serving stacks across heterogeneous GPU/NPU environments (NVIDIA RTX 4090D, Huawei Ascend 910B), ensuring cross-platform compatibility and scalable deployment.
- Built a RAG evaluation workflow using synthetic QA datasets, LLM-as-a-judge scoring, and Claude Code-assisted scripts for metric calculation and result summarisation, enabling efficient and objective assessment of retrieval quality and answer reliability.
Media Center Knowledge Q&A and AI Writing AgentMultimodal Document QA / AI Writing
09/2023 — Present- Developed a multimodal document QA system supporting scanned documents, image-only files, and mixed-content inputs by integrating InternVL2.5-78B, CLIP, and OCR-based preprocessing, enabling QA for documents containing images.
- Fine-tuned and deployed a BERT-based classification model using PyTorch and Hugging Face to replace an LLM-only classification pipeline across six marketing use cases, improving robustness and reducing average latency from 4.5s to 300ms.
- Introduced web search capability to mitigate stale model knowledge, improving the timeliness of generated responses.
- Designed a hybrid agent memory and state-management mechanism combining short-term conversational state with long-term structured memory stores for persistent context, significantly improving coherence and correctness in multi-turn, complex-task reasoning.
Skills
LLM & Agent Systems:
Retrieval Augmented Generation (RAG), Prompt Engineering & Fine-tuning, Agent Memory & State Management, Intent Classification, Agent Orchestration (LangGraph)
Agent Platforms & Autonomous Tools:
Claude Code, OpenClaw, Cursor, Dify
Programming & Machine Learning:
Python, PyTorch, TensorFlow
Infrastructure, Deployment & Runtime:
Linux, Docker, Local LLM Deployment, GPU/NPU Accelerated Inference (NVIDIA RTX 4090D, Huawei Ascend 910B)
Languages:
Mandarin (Native), English (Fluent), Cantonese (Basic)
Certifications:
PCAD™ – Certified Associate Data Analyst with Python (OpenEDG Python Institute, 07/2025), Tencent Cloud Solution Architect Professional Engineer (11/2024)