Back to Reports Workspace

Tools

A practical map of the review, coverage, and reporting tools in the Mozilla Media Monitor. These tools all read from the same monitor data, but each answers a different analyst question.

Open Report Builder

Reporting

Coverage Report

Printable brief builder

Generates a ReportMule-style brief for a selected publication date range. This first version keeps the report narrow: Mozilla coverage, Firefox coverage, publisher tiers, successful scrapes, and short context summaries.

  • Category filter: includes only Mozilla in the News and Firefox in the News.
  • Quality filter: requires kept = true, scrape_failed = false, and usable raw_text.
  • Publisher filter: checks the article domain against the verified/discovery/watchlist registry in verified_publishers.js.
  • Summary logic: shows the first two stored context snippets with Mozilla/Firefox mentions, with fallback snippets from extracted text.
  • Copy logic: each category has a one-click URL copy button for analyst handoff.

Report Compare

Analyst-report coverage comparison

Takes a pasted list of report URLs and compares them against monitor results for the selected date range. It is useful for seeing what the monitor found, what analysts already had, and what the monitor may have missed.

  • URL parsing: extracts URLs from pasted text and normalizes query strings, fragments, trailing slashes, and common www variants.
  • Monitor matching: compares pasted URLs against both url and normalized_url from Supabase.
  • Additions: surfaces kept monitor articles that were not in the pasted analyst list.
  • Publisher review: groups additions by verified, discovery, watchlist, and failed-scrape handling so analysts can review trust level quickly.

Coverage QA

Syndication Check

Duplicate and syndication discovery

Finds articles that are likely the same story republished across multiple outlets. It is read-only and does not hide, merge, or delete anything automatically.

  • Text fingerprinting: converts extracted full text into normalized word shingles.
  • Candidate search: uses MinHash-style signatures and banding to avoid comparing every article to every other article.
  • Similarity score: checks candidate pairs with Jaccard overlap and groups matches using union-find clustering.
  • Review clue: displays shared phrases so the analyst can confirm whether the cluster is truly syndicated.

Feed Coverage Audit

Google News RSS query QA

Compares a combined Google News RSS search against individual site-specific searches. This helps detect whether a broad batch feed is missing stories that appear when a publisher is queried directly.

  • Input: accepts up to 25 domains, with quick samples for high-volume and PCMag-style publisher groups.
  • Comparison: calls the feed-audit API to fetch combined and individual Google News RSS result sets.
  • Risk signal: flags individual feeds that hit the 100-entry cap or have titles absent from the combined feed.
  • Outcome: provides sample missing titles so feed batches can be split or tuned.

Source Health

Configured feed visibility

Shows configured RSS/search feeds alongside recent monitor outcomes. It answers whether a source is active, quiet, generating useful kept articles, or producing scrape failures.

  • Configured feeds: reads data/feeds.csv and groups rows by source label and type.
  • Recent activity: pulls recent articles from Supabase and maps them back through found_in_feeds.
  • Status: separates active, quiet, partial, and failure-prone sources for quick triage.
  • Source admin: includes separate tools for adding new feeds and for managing publisher tiers with reviewer metadata.
  • Security: saving feed or tier changes requires GITHUB_CONTENTS_TOKEN with repository Contents write access.

Pipeline Audit Log

Curation and domain-review inspection

Displays processed articles with the keep/discard outcome and the AI curation reason. It is the place to investigate why an article was kept, discarded, flagged, or repeatedly sourced from a noisy domain.

  • Status review: filters all processed articles by kept or discarded status.
  • Reason review: shows the model's curation reason and scrape/error details where available.
  • Domain review: identifies domains with repeated discarded articles and no recent kept articles.
  • Block workflow: offers review-before-block controls so noisy domains are not removed blindly.

Operations

Live Articles Dashboard

Source monitoring utility

The live dashboard shows kept Mozilla and Firefox coverage, review state, keyword snippets, source metadata, full text, and copy actions for handoff.

  • Data source: queries kept articles from Supabase for a selected time window.
  • Filters: narrows by category, review status, and blocklist domain.
  • Review state: updates the reviewed field in Supabase from the card action.
  • Copy actions: copies either title-plus-URL pairs or URLs only for the current filtered view.

Run Monitor

Manual pipeline trigger

Starts a monitor run without waiting for the scheduled GitHub Actions cadence. It is protected by a PIN and calls the Cloudflare Pages Function that dispatches the GitHub workflow.

  • Trigger: sends a POST request to /api/run-monitor.
  • Security: requires a configured PIN and GITHUB_WORKFLOW_TOKEN stored server-side with Actions write access.
  • Use case: good after feed changes, verified-publisher additions, or urgent coverage checks.

Blocklist Manager

Noise control

Lets analysts block domains that repeatedly produce irrelevant results. Blocked domains are hidden in the dashboard and excluded during future ingestion.

  • Dashboard filter: hides articles whose article URL domain appears in the blocklist.
  • Ingestion filter: the pipeline reads the same blocklist so future runs can skip blocked domains.
  • Manual action: blocks can be added from cards, from the manager modal, or from audit/domain review workflows.

Add New RSS/Search Feed

Source onboarding

Adds a new RSS or alert feed to the ingestion list without automatically treating the publisher as verified. This is the right tool for new Google Alerts, Talkwalker Alerts, or other feed-based inputs.

  • Inputs: source type, feed label, RSS/feed URL, optional publisher domain, and PIN.
  • Stored fields: writes a new row to data/feeds.csv using the existing Source Type, URL, and Label schema.
  • Optional verification: can also add the domain to verified_publishers.js when the source has already been reviewed.

Manage Publisher Tier

Publisher governance

Adds a publisher domain to the verified, discovery, or watchlist tier without changing ingestion. Use this when a domain should be trusted for final reporting, surfaced only for discovery, or handled as a noisy watchlist source.

  • Required input: canonical publisher domain and PIN.
  • Optional input: publisher name for clearer status messages and commit history.
  • Registry effect: verified domains power clean reports, discovery domains stay visible without inflating the brief, and watchlist domains are treated more cautiously.

Reference

Publisher Tier Lists

Verified, discovery, and watchlist registry

Shows the three manually assigned source lists that power compare groupings, report filters, and source confidence labels. Use this when you want to check whether a publisher is already classified.

  • Verified: reviewed sources that can support final reports when the article itself is relevant.
  • Discovery: useful sources that should stay visible while analysts continue learning their value.
  • Watchlist: sources to monitor cautiously because they may be noisy, indirect, or special-purpose.

Glossary and Monitor Logic

Plain-English workflow explanation

Explains terms like caught, not caught, filtered before review, scrape alert, trusted source, discovery, watchlist, and unclassified. It also walks through how a link moves from feed ingestion into the dashboards.

  • Glossary: defines the words used on the compare and reporting pages.
  • Pipeline logic: explains the sequence from feed discovery to scraping, filtering, AI review, and storage.
  • Analyst actions: separates source assignment from article relevance decisions.

Suggested Workflow

  1. Start in the Reports Workspace to choose the active report and review candidate coverage.
  2. Use Source Health when a feed appears too quiet, noisy, or scrape-heavy.
  3. Run Feed Coverage Audit when a publisher batch feed may be missing stories from high-volume sites.
  4. Use Syndication Check to identify repeated same-story clusters before copying or reporting links.
  5. Use Report Compare when checking monitor results against an analyst-supplied or third-party report.
  6. Generate the Coverage Report with the Verified tier when you need a clean, printable brief from trusted publishers only.
AI grouping for the report should be added server-side, not inside the browser. The safe design is to pre-cluster similar articles with the syndication logic, send compact metadata/snippets to the model, and store or render strict JSON story groups.