Methodology & Data Sources

How we track 100+ entities across 26 years using only public records

Goblin House is an investigative intelligence platform. Every fact, connection, and conflict of interest finding is sourced from credible public databases, government filings, and primary documents — never from rumour or anonymous sources.

Overview

This platform tracks individuals, companies, government contractors, and financial instruments connected to the Palantir Technologies ecosystem, Israeli defense technology networks, and associated government contractor networks across the period 1998–2024.

The core investigative question: who benefits from Palantir's government contracts, and what financial and political relationships underpin those benefits? All data is drawn from sources accessible to any member of the public under freedom of information law or open-data mandates.

"The architecture of secrecy depends on the assumption that no one is looking. We are looking." — from the primary investigation document.

Confidence Tiers

Every fact and connection in this system is assigned a confidence tier reflecting the quality and verifiability of its underlying source.

Primary

Government database with URL

Direct record from an official government system: USASpending.gov contract award, SEC EDGAR filing, FEC campaign finance disclosure, Companies House registration. Every such fact includes a direct URL to the source record.

Secondary

Verified press or corporate disclosure

Published reporting by credible investigative outlets (ProPublica, Reuters, Financial Times, Guardian), court filings with docket numbers, or corporate disclosures such as annual reports and proxy statements.

Inferential

AI-reasoned from known facts

Connections or conclusions reached by the Claude AI agent by reasoning across verified primary and secondary facts. These are always labelled inferential and are never presented as established fact without corroboration.

In the connection graph, primary relationships are shown as solid lines, inferential as dashed. In entity fact lists, each fact's confidence level is displayed alongside its source URL.

Public Data Sources

The following sources are ingested automatically via public APIs on a nightly schedule. No proprietary databases are used. All data is available free of charge to any member of the public.

Source	Data Type	Coverage	API
USASpending.gov	Federal contract awards, grants, loans	All US federal agencies · 2000–present	api.usaspending.gov (no key required)
SEC EDGAR	10-K/10-Q/8-K filings, ownership disclosures (Form 4/SC 13G), proxy statements	All public companies · 1993–present	efts.sec.gov (no key required)
FEC (Federal Election Commission)	Individual and PAC campaign contributions · committee receipts	US federal elections · 2002–present	api.open.fec.gov
ProPublica / Senate LDA	Federal lobbying disclosures · registrant and client data	All registered lobbyists · 1999–present	lda.senate.gov
UK Companies House	Company registrations, officer appointments, filings, persons of significant control	All UK-registered companies · 1844–present	Companies House API

Autonomous Research Agent

The platform includes an AI research agent powered by Claude (Anthropic). The agent's role is strictly limited to:

Synthesis: Summarising and cross-referencing verified facts from primary sources to identify patterns.
Conflict detection: Identifying potential conflicts of interest, undisclosed relationships, or regulatory capture scenarios worthy of further investigation.
Connection mapping: Inferring plausible relationships between entities based on overlapping facts (marked as inferential confidence).
Entity discovery: Identifying new entities mentioned in contract data or documents that may warrant tracking.

The agent operates within hard cost limits ($25/day) and rate limits (8 API calls/minute). All AI-generated content is clearly labelled. The agent does not have access to non-public data sources and cannot send external communications.

Inferential findings are always presented as hypotheses for further investigation, not as established facts. Every finding includes the evidence chain from which it was derived.

Entity Auto-Discovery

When the system ingests new contract or financial data, it automatically extracts entity names mentioned in those records (contractors, awardees, investors, officers). These names are normalised and fuzzy-matched against the existing entity database.

If a new entity name is found that doesn't exist in the database and appears to be a significant company or individual (based on contract value or relationship depth), it is automatically added to the tracking database and queued for public data ingestion. This allows the network to grow organically as new connections are discovered.

All auto-discovered entities are flagged with their discovery method and source, allowing researchers to review and verify the relevance of newly added entities.

What We Don't Do

We do not use anonymous sources, hacked data, or leaked documents of unknown provenance.
We do not access private databases, proprietary data services, or paywalled information.
We do not present inferential AI reasoning as established fact.
We do not make legal determinations — we identify patterns worthy of journalistic or regulatory scrutiny.
We do not target private individuals who are not public figures or acting in a public-interest capacity.

Data Freshness

Public API data is re-ingested nightly at 00:30 UTC. Each entity's data freshness is tracked and displayed on the research dashboard. The nightly ingestion schedule ensures that new government contracts, SEC filings, and campaign finance disclosures are captured within 24 hours of their public release.

Source counts distinguish between Primary sources with direct government URLs and AI-generated sources produced by the research agent — giving a clear picture of how much of each entity's profile is grounded in primary records.

Contact & Corrections

If you believe a fact in this database is incorrect, or if you are a subject of this investigation who wishes to provide additional context or correction, please refer to the contact information in the primary investigation document.

We maintain a corrections log and will update any confirmed errors within 48 hours. Corrections are noted in the entity's fact record with the date and nature of the correction.