Xihang Dai dxh2723192626@gmail.com ยท Hong Kong GitHub: https://github.com/HA7CH EDUCATION ========= Beijing University of Posts and Telecommunications (Sep 2019 - Jun 2023) Bachelor of Engineering, Telecommunications Engineering and Management EXPERIENCE ========== China Unicom Global Limited โ€” Assistant AI Engineer (Sep 2023 - Present) - Designed and implemented end-to-end Python pipelines for ingesting, preprocessing, and indexing multi-format knowledge sources using Surya OCR to build a unified enterprise knowledge base. - Built an LLM-based RAG backend using FAISS, BGE embeddings, BGE reranker, and Qwen2.5-72B for answer generation, significantly improving answer accuracy. - Deployed Docker-based containerised model-serving stacks across heterogeneous GPU/NPU environments (NVIDIA RTX 4090D, Huawei Ascend 910B), ensuring cross-platform compatibility. - Built a RAG evaluation workflow using synthetic QA datasets, LLM-as-a-judge scoring, and Claude Code-assisted scripts for metric calculation and result summarisation. - Developed a multimodal document QA system integrating InternVL2.5-78B, CLIP, and OCR-based preprocessing, enabling question answering for image-containing documents. - Fine-tuned a BERT-based classification model with PyTorch and Hugging Face to replace an LLM-only pipeline across six marketing use cases, reducing average latency from 4.5s to 300ms. - Designed a hybrid agent memory and state-management mechanism combining short-term conversational state with long-term structured memory stores, improving coherence in multi-turn reasoning. PROJECTS ======== Intelligent Question Answering Platform โ€” Enterprise Project (Sep 2023 - Present) - Designed and implemented end-to-end Python pipelines for ingesting, preprocessing, and indexing multi-format knowledge sources (documents, structured/semi-structured data) using Surya OCR. - Built RAG backend with FAISS vector store, BGE embeddings, BGE reranker, and Qwen2.5-72B, significantly improving answer accuracy. - Developed source traceability and fault-tolerance mechanisms, enhancing system trustworthiness and stability. - Deployed Docker-based model-serving stacks across heterogeneous GPU/NPU environments (NVIDIA RTX 4090D, Huawei Ascend 910B). - Built RAG evaluation workflow using synthetic QA datasets, LLM-as-a-judge scoring, and Claude Code-assisted scripts. Media Center Knowledge Q&A and AI Writing Agent โ€” Enterprise Project (Jan 2024 - Present) - Developed a multimodal document QA system supporting scanned documents, image-only files, and mixed-content inputs by integrating InternVL2.5-78B, CLIP, and OCR-based preprocessing. - Fine-tuned and deployed a BERT-based classification model using PyTorch and Hugging Face, improving robustness and reducing average latency from 4.5s to 300ms across six marketing use cases. - Introduced web search capability to solve the issue of stale model knowledge, improving timeliness of generated responses. - Designed a hybrid agent memory and state-management mechanism combining short-term conversational state with long-term structured memory stores. SKILLS ====== LLM & Agent Systems: Retrieval Augmented Generation (RAG), Prompt Engineering & Fine-tuning, Agent Memory & State Management, Intent Classification, Agent Orchestration (LangGraph) Agent Platforms & Tools: Claude Code, OpenClaw, Cursor, Dify Programming & ML: Python, PyTorch, TensorFlow Infrastructure & Deployment: Linux, Docker, Local LLM Deployment, GPU/NPU Accelerated Inference, NVIDIA RTX 4090D, Huawei Ascend 910B Languages: Mandarin (Native), English (Fluent), Cantonese (Basic) cv.ha7ch.com/daisy