Xihang Dai
Education
Beijing University of Posts and Telecommunications
Sep 2019 — Jun 2023Bachelor of Engineering, Telecommunications Engineering and Management
Experience
China Unicom Global LimitedAssistant AI Engineer
Sep 2023 — Present- Designed and implemented end-to-end Python pipelines for ingesting, preprocessing, and indexing multi-format knowledge sources using Surya OCR to build a unified enterprise knowledge base.
- Built an LLM-based RAG backend using FAISS, BGE embeddings, BGE reranker, and Qwen2.5-72B for answer generation, significantly improving answer accuracy.
- Deployed Docker-based containerised model-serving stacks across heterogeneous GPU/NPU environments (NVIDIA RTX 4090D, Huawei Ascend 910B), ensuring cross-platform compatibility.
- Built a RAG evaluation workflow using synthetic QA datasets, LLM-as-a-judge scoring, and Claude Code-assisted scripts for metric calculation and result summarisation.
- Developed a multimodal document QA system integrating InternVL2.5-78B, CLIP, and OCR-based preprocessing, enabling question answering for image-containing documents.
- Fine-tuned a BERT-based classification model with PyTorch and Hugging Face to replace an LLM-only pipeline across six marketing use cases, reducing average latency from 4.5s to 300ms.
- Designed a hybrid agent memory and state-management mechanism combining short-term conversational state with long-term structured memory stores, improving coherence in multi-turn reasoning.
Projects
Intelligent Question Answering PlatformEnterprise Project
Sep 2023 — Present- Designed and implemented end-to-end Python pipelines for ingesting, preprocessing, and indexing multi-format knowledge sources (documents, structured/semi-structured data) using Surya OCR.
- Built RAG backend with FAISS vector store, BGE embeddings, BGE reranker, and Qwen2.5-72B, significantly improving answer accuracy.
- Developed source traceability and fault-tolerance mechanisms, enhancing system trustworthiness and stability.
- Deployed Docker-based model-serving stacks across heterogeneous GPU/NPU environments (NVIDIA RTX 4090D, Huawei Ascend 910B).
- Built RAG evaluation workflow using synthetic QA datasets, LLM-as-a-judge scoring, and Claude Code-assisted scripts.
Media Center Knowledge Q&A and AI Writing AgentEnterprise Project
Jan 2024 — Present- Developed a multimodal document QA system supporting scanned documents, image-only files, and mixed-content inputs by integrating InternVL2.5-78B, CLIP, and OCR-based preprocessing.
- Fine-tuned and deployed a BERT-based classification model using PyTorch and Hugging Face, improving robustness and reducing average latency from 4.5s to 300ms across six marketing use cases.
- Introduced web search capability to solve the issue of stale model knowledge, improving timeliness of generated responses.
- Designed a hybrid agent memory and state-management mechanism combining short-term conversational state with long-term structured memory stores.
Skills
LLM & Agent Systems:
Retrieval Augmented Generation (RAG), Prompt Engineering & Fine-tuning, Agent Memory & State Management, Intent Classification, Agent Orchestration (LangGraph)
Agent Platforms & Tools:
Claude Code, OpenClaw, Cursor, Dify
Programming & ML:
Python, PyTorch, TensorFlow
Infrastructure & Deployment:
Linux, Docker, Local LLM Deployment, GPU/NPU Accelerated Inference, NVIDIA RTX 4090D, Huawei Ascend 910B
Languages:
Mandarin (Native), English (Fluent), Cantonese (Basic)