Methodology & Data Sources
How we track 100+ entities across 26 years using only public records
Goblin House is an investigative intelligence platform. Every fact, connection, and conflict of interest finding is sourced from credible public databases, government filings, and primary documents — never from rumour or anonymous sources.

This platform tracks individuals, companies, government contractors, and financial instruments connected to the Palantir Technologies ecosystem, Israeli defense technology networks, and associated government contractor networks across the period 1998–2024.

The core investigative question: who benefits from Palantir's government contracts, and what financial and political relationships underpin those benefits? All data is drawn from sources accessible to any member of the public under freedom of information law or open-data mandates.

"The architecture of secrecy depends on the assumption that no one is looking. We are looking." — from the primary investigation document.

Every fact and connection in this system is assigned a confidence tier reflecting the quality and verifiability of its underlying source.

Primary
Government database with URL
Direct record from an official government system: USASpending.gov contract award, SEC EDGAR filing, FEC campaign finance disclosure, Companies House registration. Every such fact includes a direct URL to the source record.
Secondary
Verified press or corporate disclosure
Published reporting by credible investigative outlets (ProPublica, Reuters, Financial Times, Guardian), court filings with docket numbers, or corporate disclosures such as annual reports and proxy statements.
Inferential
AI-reasoned from known facts
Connections or conclusions reached by the Claude AI agent by reasoning across verified primary and secondary facts. These are always labelled inferential and are never presented as established fact without corroboration.

In the connection graph, primary relationships are shown as solid lines, inferential as dashed. In entity fact lists, each fact's confidence level is displayed alongside its source URL.

The following sources are ingested automatically via public APIs on a nightly schedule. No proprietary databases are used. All data is available free of charge to any member of the public.

Source Data Type Coverage API
USASpending.gov Federal contract awards, grants, loans All US federal agencies · 2000–present api.usaspending.gov (no key required)
SEC EDGAR 10-K/10-Q/8-K filings, ownership disclosures (Form 4/SC 13G), proxy statements All public companies · 1993–present efts.sec.gov (no key required)
FEC (Federal Election Commission) Individual and PAC campaign contributions · committee receipts US federal elections · 2002–present api.open.fec.gov
ProPublica / Senate LDA Federal lobbying disclosures · registrant and client data All registered lobbyists · 1999–present lda.senate.gov
UK Companies House Company registrations, officer appointments, filings, persons of significant control All UK-registered companies · 1844–present Companies House API

The platform includes an AI research agent powered by Claude (Anthropic). The agent's role is strictly limited to:

  • Synthesis: Summarising and cross-referencing verified facts from primary sources to identify patterns.
  • Conflict detection: Identifying potential conflicts of interest, undisclosed relationships, or regulatory capture scenarios worthy of further investigation.
  • Connection mapping: Inferring plausible relationships between entities based on overlapping facts (marked as inferential confidence).
  • Entity discovery: Identifying new entities mentioned in contract data or documents that may warrant tracking.

The agent operates within hard cost limits ($25/day) and rate limits (8 API calls/minute). All AI-generated content is clearly labelled. The agent does not have access to non-public data sources and cannot send external communications.

Inferential findings are always presented as hypotheses for further investigation, not as established facts. Every finding includes the evidence chain from which it was derived.

When the system ingests new contract or financial data, it automatically extracts entity names mentioned in those records (contractors, awardees, investors, officers). These names are normalised and fuzzy-matched against the existing entity database.

If a new entity name is found that doesn't exist in the database and appears to be a significant company or individual (based on contract value or relationship depth), it is automatically added to the tracking database and queued for public data ingestion. This allows the network to grow organically as new connections are discovered.

All auto-discovered entities are flagged with their discovery method and source, allowing researchers to review and verify the relevance of newly added entities.

  • We do not use anonymous sources, hacked data, or leaked documents of unknown provenance.
  • We do not access private databases, proprietary data services, or paywalled information.
  • We do not present inferential AI reasoning as established fact.
  • We do not make legal determinations — we identify patterns worthy of journalistic or regulatory scrutiny.
  • We do not target private individuals who are not public figures or acting in a public-interest capacity.

Public API data is re-ingested nightly at 00:30 UTC. Each entity's data freshness is tracked and displayed on the research dashboard. The nightly ingestion schedule ensures that new government contracts, SEC filings, and campaign finance disclosures are captured within 24 hours of their public release.

Source counts distinguish between Primary sources with direct government URLs and AI-generated sources produced by the research agent — giving a clear picture of how much of each entity's profile is grounded in primary records.

If you believe a fact in this database is incorrect, or if you are a subject of this investigation who wishes to provide additional context or correction, please refer to the contact information in the primary investigation document.

We maintain a corrections log and will update any confirmed errors within 48 hours. Corrections are noted in the entity's fact record with the date and nature of the correction.