Xihang Dai

AI Engineer — RAG pipelines and LLM-powered agent systems

Education

Beijing University of Posts and Telecommunications

09/2019 — 06/2023

Bachelor of Engineering, Telecommunications Engineering and Management

Experience

China Unicom Global LimitedAssistant AI Engineer

09/2023 — Present

AI engineer specialising in LLM-driven agent architectures and knowledge-centric systems, with proven experience in enterprise-scale deployment, evaluation, and heterogeneous accelerator environments.
Took primary responsibility for end-to-end on-premises LLM deployment in isolated enterprise environments using Docker on GPU/NPU infrastructure; optimised inference performance for high-concurrency workloads and developed a comprehensive RAG evaluation framework.
Experienced with LangGraph and Dify for agent development; working knowledge of OpenClaw for agent runtime orchestration. Hands-on with state management, short-/long-term memory design, and multi-turn context maintenance.
Strong interest in long-term planning and self-evolving agent systems, with the goal of leveraging isolated-environment deployment experience to explore federated learning and privacy-enhanced techniques for agentic workflows.

Projects

Intelligent Question Answering PlatformEnterprise Knowledge Platform

09/2023 — Present

Designed and implemented end-to-end Python pipelines for ingesting, preprocessing, and indexing multi-format knowledge sources (documents, structured/semi-structured data), using Surya OCR for scanned and image-based content to build a unified enterprise knowledge base.
Built an LLM-based RAG backend using FAISS as the vector store, BGE embeddings for retrieval, BGE reranker for re-ranking, and Qwen2.5-72B for answer generation, significantly improving answer accuracy.
Developed source traceability and fault-tolerance mechanisms, enhancing system trustworthiness, stability, and usability.
Deployed Docker-based containerised model-serving stacks across heterogeneous GPU/NPU environments (NVIDIA RTX 4090D, Huawei Ascend 910B), ensuring cross-platform compatibility and scalable deployment.
Built a RAG evaluation workflow using synthetic QA datasets, LLM-as-a-judge scoring, and Claude Code-assisted scripts for metric calculation and result summarisation, enabling efficient and objective assessment of retrieval quality and answer reliability.

Media Center Knowledge Q&A and AI Writing AgentMultimodal Document QA / AI Writing

09/2023 — Present

Developed a multimodal document QA system supporting scanned documents, image-only files, and mixed-content inputs by integrating InternVL2.5-78B, CLIP, and OCR-based preprocessing, enabling QA for documents containing images.
Fine-tuned and deployed a BERT-based classification model using PyTorch and Hugging Face to replace an LLM-only classification pipeline across six marketing use cases, improving robustness and reducing average latency from 4.5s to 300ms.
Introduced web search capability to mitigate stale model knowledge, improving the timeliness of generated responses.
Designed a hybrid agent memory and state-management mechanism combining short-term conversational state with long-term structured memory stores for persistent context, significantly improving coherence and correctness in multi-turn, complex-task reasoning.

Skills

LLM & Agent Systems:

Retrieval Augmented Generation (RAG), Prompt Engineering & Fine-tuning, Agent Memory & State Management, Intent Classification, Agent Orchestration (LangGraph)

Agent Platforms & Autonomous Tools:

Claude Code, OpenClaw, Cursor, Dify

Programming & Machine Learning:

Python, PyTorch, TensorFlow

Infrastructure, Deployment & Runtime:

Linux, Docker, Local LLM Deployment, GPU/NPU Accelerated Inference (NVIDIA RTX 4090D, Huawei Ascend 910B)

Languages:

Mandarin (Native), English (Fluent), Cantonese (Basic)

Certifications:

PCAD™ – Certified Associate Data Analyst with Python (OpenEDG Python Institute, 07/2025), Tencent Cloud Solution Architect Professional Engineer (11/2024)

Experience

China Unicom Global LimitedAssistant AI Engineer

09/2023 — Present

AI engineer specialising in LLM-driven agent architectures and knowledge-centric systems, with proven experience in enterprise-scale deployment, evaluation, and heterogeneous accelerator environments.
Took primary responsibility for end-to-end on-premises LLM deployment in isolated enterprise environments using Docker on GPU/NPU infrastructure; optimised inference performance for high-concurrency workloads and developed a comprehensive RAG evaluation framework.
Experienced with LangGraph and Dify for agent development; working knowledge of OpenClaw for agent runtime orchestration. Hands-on with state management, short-/long-term memory design, and multi-turn context maintenance.
Strong interest in long-term planning and self-evolving agent systems, with the goal of leveraging isolated-environment deployment experience to explore federated learning and privacy-enhanced techniques for agentic workflows.

Projects

Intelligent Question Answering PlatformEnterprise Knowledge Platform

09/2023 — Present

Designed and implemented end-to-end Python pipelines for ingesting, preprocessing, and indexing multi-format knowledge sources (documents, structured/semi-structured data), using Surya OCR for scanned and image-based content to build a unified enterprise knowledge base.
Built an LLM-based RAG backend using FAISS as the vector store, BGE embeddings for retrieval, BGE reranker for re-ranking, and Qwen2.5-72B for answer generation, significantly improving answer accuracy.
Developed source traceability and fault-tolerance mechanisms, enhancing system trustworthiness, stability, and usability.
Deployed Docker-based containerised model-serving stacks across heterogeneous GPU/NPU environments (NVIDIA RTX 4090D, Huawei Ascend 910B), ensuring cross-platform compatibility and scalable deployment.
Built a RAG evaluation workflow using synthetic QA datasets, LLM-as-a-judge scoring, and Claude Code-assisted scripts for metric calculation and result summarisation, enabling efficient and objective assessment of retrieval quality and answer reliability.

Media Center Knowledge Q&A and AI Writing AgentMultimodal Document QA / AI Writing

09/2023 — Present

Developed a multimodal document QA system supporting scanned documents, image-only files, and mixed-content inputs by integrating InternVL2.5-78B, CLIP, and OCR-based preprocessing, enabling QA for documents containing images.
Fine-tuned and deployed a BERT-based classification model using PyTorch and Hugging Face to replace an LLM-only classification pipeline across six marketing use cases, improving robustness and reducing average latency from 4.5s to 300ms.
Introduced web search capability to mitigate stale model knowledge, improving the timeliness of generated responses.
Designed a hybrid agent memory and state-management mechanism combining short-term conversational state with long-term structured memory stores for persistent context, significantly improving coherence and correctness in multi-turn, complex-task reasoning.

Skills

LLM & Agent Systems:

Retrieval Augmented Generation (RAG), Prompt Engineering & Fine-tuning, Agent Memory & State Management, Intent Classification, Agent Orchestration (LangGraph)

Agent Platforms & Autonomous Tools:

Claude Code, OpenClaw, Cursor, Dify

Programming & Machine Learning:

Python, PyTorch, TensorFlow

Infrastructure, Deployment & Runtime:

Linux, Docker, Local LLM Deployment, GPU/NPU Accelerated Inference (NVIDIA RTX 4090D, Huawei Ascend 910B)

Languages:

Mandarin (Native), English (Fluent), Cantonese (Basic)

Certifications:

PCAD™ – Certified Associate Data Analyst with Python (OpenEDG Python Institute, 07/2025), Tencent Cloud Solution Architect Professional Engineer (11/2024)