← Back to Case Studies
Automation & AI Engineering

OCHA-RAG: AI Document Analysis System for Humanitarian Crisis Response

Built a RAG-powered document analysis system for UN OCHA analysts — processes large PDF collections, enables natural language querying, and generates structured analytical reports with page-level citations using GPT-4, Haystack, and LlamaIndex.

4 languages

EN, FR, AR, DE support

RAG

Retrieval-augmented generation

Page-level

Citation tracking

Pilot

Active with UN OCHA

Retrieval-Augmented Generation System for UN OCHA

The Challenge

UN OCHA humanitarian analysts needed to rapidly analyse large collections of PDF documents during natural disaster and conflict situations:

  • Analysts receive hundreds of PDF documents per crisis — far too many to read manually under time pressure
  • Reports needed to be generated in multiple languages (English, French, Arabic, German)
  • Every analytical claim required verifiable, page-level citations back to source documents
  • Different OCHA offices needed data isolation to prevent cross-contamination of sensitive information
  • Existing search tools could not understand natural language queries across heterogeneous document collections
  • Generated reports needed to be exported as Word and PDF documents for distribution

The Solution

I designed and built a sophisticated RAG pipeline that transforms raw PDFs into queryable knowledge with structured report generation:

PDF Ingestion Pipeline

LlamaParse extracts text from complex PDF documents, handling multi-column layouts, tables, and embedded images across multiple languages.

Vector Search & Retrieval

Documents are chunked, vectorised with Haystack 2.x, and stored in PostgreSQL with pgvector — enabling fast semantic similarity search with reranking.

AI Report Generation

LlamaIndex orchestrates GPT-4 to generate structured analytical reports from retrieved document chunks, with automatic page-level citation tracking.

Multi-Language Support

Full support for English, French, Arabic, and German — both for document ingestion and report generation.

Office-Level Isolation

Each OCHA office operates in an isolated data environment, ensuring sensitive humanitarian information stays properly compartmentalised.

Export & Distribution

Generated reports can be exported as Word or PDF documents, ready for distribution to stakeholders and decision-makers.

Results

The system is currently in pilot phase, being tested in active humanitarian situations:

Pilot active

Being tested in natural disaster and multi-year conflict situations by OCHA analysts

Hours → minutes

Analysts can query hundreds of documents and receive structured reports in minutes

Verifiable citations

Every claim in generated reports links back to specific pages in source documents

Multi-language

Supports document analysis and report generation across 4 languages

Technology Stack

Frontend

  • React

Backend

  • FastAPI
  • Python

AI & ML

  • GPT-4
  • LlamaIndex
  • Haystack 2.x
  • LiteLLM
  • LlamaParse

Database & Infrastructure

  • PostgreSQL
  • pgvector
  • Docker

Need something similar?

I build custom platforms, automation systems, and data solutions. Let's discuss your project.