OCHA-RAG: AI Document Analysis System for Humanitarian Crisis Response
Built a RAG-powered document analysis system for UN OCHA analysts — processes large PDF collections, enables natural language querying, and generates structured analytical reports with page-level citations using GPT-4, Haystack, and LlamaIndex.
EN, FR, AR, DE support
Retrieval-augmented generation
Citation tracking
Active with UN OCHA

The Challenge
UN OCHA humanitarian analysts needed to rapidly analyse large collections of PDF documents during natural disaster and conflict situations:
- Analysts receive hundreds of PDF documents per crisis — far too many to read manually under time pressure
- Reports needed to be generated in multiple languages (English, French, Arabic, German)
- Every analytical claim required verifiable, page-level citations back to source documents
- Different OCHA offices needed data isolation to prevent cross-contamination of sensitive information
- Existing search tools could not understand natural language queries across heterogeneous document collections
- Generated reports needed to be exported as Word and PDF documents for distribution
The Solution
I designed and built a sophisticated RAG pipeline that transforms raw PDFs into queryable knowledge with structured report generation:
PDF Ingestion Pipeline
LlamaParse extracts text from complex PDF documents, handling multi-column layouts, tables, and embedded images across multiple languages.
Vector Search & Retrieval
Documents are chunked, vectorised with Haystack 2.x, and stored in PostgreSQL with pgvector — enabling fast semantic similarity search with reranking.
AI Report Generation
LlamaIndex orchestrates GPT-4 to generate structured analytical reports from retrieved document chunks, with automatic page-level citation tracking.
Multi-Language Support
Full support for English, French, Arabic, and German — both for document ingestion and report generation.
Office-Level Isolation
Each OCHA office operates in an isolated data environment, ensuring sensitive humanitarian information stays properly compartmentalised.
Export & Distribution
Generated reports can be exported as Word or PDF documents, ready for distribution to stakeholders and decision-makers.
Results
The system is currently in pilot phase, being tested in active humanitarian situations:
Being tested in natural disaster and multi-year conflict situations by OCHA analysts
Analysts can query hundreds of documents and receive structured reports in minutes
Every claim in generated reports links back to specific pages in source documents
Supports document analysis and report generation across 4 languages
Technology Stack
Frontend
- • React
Backend
- • FastAPI
- • Python
AI & ML
- • GPT-4
- • LlamaIndex
- • Haystack 2.x
- • LiteLLM
- • LlamaParse
Database & Infrastructure
- • PostgreSQL
- • pgvector
- • Docker
Need something similar?
I build custom platforms, automation systems, and data solutions. Let's discuss your project.