Fatos Eleitorais - Project

Overview

Fatos Eleitorais is a political transparency platform that takes long-form interviews and debates and turns them into searchable statements with exact timestamps and a complete trail back to the original video. The goal is to make it trivial to answer “did this candidate really say that?” with a direct link to the precise second in the source video, plus date, context, and topic.

The pipeline starts from the raw video (e.g. YouTube), goes through automatic transcription, and then an AI worker segments the content into question–answer blocks, extracts the relevant statements, and organizes them by candidate, topic, and type of statement (promise, assessment, proposal, etc.). Every segment goes through a human review panel before publication so that LLM errors never become “official truth”.

This is a solo project. I designed the architecture, built the Next.js frontend, the Fastify + TypeScript API, the Python AI worker, the PostgreSQL-based job queue, and the production infrastructure and CI/CD pipeline. I took it through to a working beta - interview ingestion, statement search, an electoral data model, and a curation interface - though it was never launched publicly.

While it is still pre-launch, the system already runs with an architecture tailored for national-scale elections: concurrent workers, chained jobs, crash-resilient reprocessing, and a data model that supports elections, offices, candidacies, and a chronological view of each candidate’s discourse.

Key Differentiator

The core differentiator is not “yet another video search”, but the combination of three things:

Semantic segmentation of long interviews where the basic unit is “question + answer”, not arbitrary time cuts.
An AI pipeline that extracts both statements and explicit electoral promises, always tied back to the exact second in the original video.
Human curation built into the core flow as a required step, not an optional extra, so the final published content is auditable and trustworthy.

Technically, the project stands out by solving problems that are often tackled with several extra components (message broker, dedicated embeddings service, standalone notification system) using a lean architecture. PostgreSQL acts as the job queue and event backbone, Python workers acquire jobs, and the frontend talks to the API exclusively through Next.js Server Actions, behind a strict CSP.

The segmentation pipeline uses LLM with reasoning to respect the narrative of the debate rather than relying on line breaks or speaker tags in the transcript. That allows the system to handle imperfect ASR output and questions/answers that cross chunk boundaries.

Architecture

Frontend (Next.js 16 / React 19): Renders search, candidate pages, timeline of statements, and the review dashboard; all write operations go through Server Actions that call the internal REST API from the server side.
API Gateway (Fastify 5 + TypeScript): Exposes internal REST endpoints for video ingestion, job management, querying statements, promises, topics, and electoral structure; validates all payloads with Zod and accesses the database through Drizzle ORM.
Database (PostgreSQL): Stores the full electoral domain model (elections, offices, candidates, mandates), videos, transcripts, segments, statements, promises, user alerts, and also acts as a job queue and notification channel.
AI Worker (Python 3.12 + LangChain/LangGraph): Consumes jobs from the queue (segmentation, statement extraction, promise extraction), calls LLM for text analysis, and writes enriched segments back to PostgreSQL.
Embeddings Service: Generates vector representations for statement snippets to detect duplicates and prepare for semantic search; currently used mainly for dedup/merge logic.
File Storage: Stores SRT files, transcripts, and intermediate processing artifacts, decoupled from the relational database.
E-mail System: Sends notifications to users following specific candidates or topics whenever new statements are published.
Infrastructure & Deploy (Docker + GitHub Actions + Cloudflare): Each service (API, frontend, worker) has its own Dockerfile; GitHub Actions pipelines handle builds, tests, DB migrations, and deployments; Cloudflare sits in front for proxying, caching static assets, and enforcing security policies.

Technical Highlights

Implemented a full job queue on top of PostgreSQL using FOR UPDATE SKIP LOCKED for atomic job acquisition by multiple workers, avoiding the need for a separate message broker like RabbitMQ or Kafka.
Designed a chained job pipeline to process videos up to 2 hours long (segmentation → statement extraction → promise extraction) with exponential backoff retries and automatic recovery of “stuck” running jobs after worker crashes.
Used LLM with reasoning to semantically segment interviews into question–answer blocks, independent of chunk boundaries from the transcription step.
Built a cross-chunk merge algorithm that uses time gaps, chunk boundaries, and semantic similarity of questions to reconstruct continuous segments that were split across transcript chunks.
Structured the frontend on Next.js 16 Server Actions to keep all API calls on the server, enforcing a stricter CSP and avoiding direct exposure of internal REST endpoints to the browser.
Modeled a normalized electoral domain in PostgreSQL (elections, offices, candidates, mandates, promises, topics) to support efficient queries by candidate, topic, time range, and type of statement.
Set up a Docker + GitHub Actions deployment pipeline that builds and ships all three services (API, frontend, worker), runs Drizzle migrations, and publishes to a production environment behind Cloudflare.