Xihang Dai

dxh2723192626@gmail.com  ·  Hong Kong
GitHub: https://github.com/HA7CH

EDUCATION
=========

Beijing University of Posts and Telecommunications   (Sep 2019 - Jun 2023)
Bachelor of Engineering, Telecommunications Engineering and Management

EXPERIENCE
==========

China Unicom Global Limited — Assistant AI Engineer   (Sep 2023 - Present)
  - Designed and implemented end-to-end Python pipelines for ingesting, preprocessing, and indexing multi-format knowledge sources using Surya OCR to build a unified enterprise knowledge base.
  - Built an LLM-based RAG backend using FAISS, BGE embeddings, BGE reranker, and Qwen2.5-72B for answer generation, significantly improving answer accuracy.
  - Deployed Docker-based containerised model-serving stacks across heterogeneous GPU/NPU environments (NVIDIA RTX 4090D, Huawei Ascend 910B), ensuring cross-platform compatibility.
  - Built a RAG evaluation workflow using synthetic QA datasets, LLM-as-a-judge scoring, and Claude Code-assisted scripts for metric calculation and result summarisation.
  - Developed a multimodal document QA system integrating InternVL2.5-78B, CLIP, and OCR-based preprocessing, enabling question answering for image-containing documents.
  - Fine-tuned a BERT-based classification model with PyTorch and Hugging Face to replace an LLM-only pipeline across six marketing use cases, reducing average latency from 4.5s to 300ms.
  - Designed a hybrid agent memory and state-management mechanism combining short-term conversational state with long-term structured memory stores, improving coherence in multi-turn reasoning.

PROJECTS
========

Intelligent Question Answering Platform — Enterprise Project   (Sep 2023 - Present)
  - Designed and implemented end-to-end Python pipelines for ingesting, preprocessing, and indexing multi-format knowledge sources (documents, structured/semi-structured data) using Surya OCR.
  - Built RAG backend with FAISS vector store, BGE embeddings, BGE reranker, and Qwen2.5-72B, significantly improving answer accuracy.
  - Developed source traceability and fault-tolerance mechanisms, enhancing system trustworthiness and stability.
  - Deployed Docker-based model-serving stacks across heterogeneous GPU/NPU environments (NVIDIA RTX 4090D, Huawei Ascend 910B).
  - Built RAG evaluation workflow using synthetic QA datasets, LLM-as-a-judge scoring, and Claude Code-assisted scripts.

Media Center Knowledge Q&A and AI Writing Agent — Enterprise Project   (Jan 2024 - Present)
  - Developed a multimodal document QA system supporting scanned documents, image-only files, and mixed-content inputs by integrating InternVL2.5-78B, CLIP, and OCR-based preprocessing.
  - Fine-tuned and deployed a BERT-based classification model using PyTorch and Hugging Face, improving robustness and reducing average latency from 4.5s to 300ms across six marketing use cases.
  - Introduced web search capability to solve the issue of stale model knowledge, improving timeliness of generated responses.
  - Designed a hybrid agent memory and state-management mechanism combining short-term conversational state with long-term structured memory stores.

SKILLS
======

LLM & Agent Systems: Retrieval Augmented Generation (RAG), Prompt Engineering & Fine-tuning, Agent Memory & State Management, Intent Classification, Agent Orchestration (LangGraph)
Agent Platforms & Tools: Claude Code, OpenClaw, Cursor, Dify
Programming & ML: Python, PyTorch, TensorFlow
Infrastructure & Deployment: Linux, Docker, Local LLM Deployment, GPU/NPU Accelerated Inference, NVIDIA RTX 4090D, Huawei Ascend 910B
Languages: Mandarin (Native), English (Fluent), Cantonese (Basic)

cv.ha7ch.com/daisy