Skip to content

Feat/pom fetcher#4144

Draft
ulemons wants to merge 14 commits into
mainfrom
feat/pom-fetcher
Draft

Feat/pom fetcher#4144
ulemons wants to merge 14 commits into
mainfrom
feat/pom-fetcher

Conversation

@ulemons
Copy link
Copy Markdown
Contributor

@ulemons ulemons commented May 26, 2026

Summary

Changes

Type of change

  • Bug fix
  • New feature
  • Refactor / cleanup
  • Performance improvement
  • Chore / dependency update
  • Documentation

JIRA ticket

themarolt and others added 14 commits May 25, 2026 22:01
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
…dRepos)

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
…plicitly

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
…ved to packages_worker)

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Copilot AI review requested due to automatic review settings May 26, 2026 15:59
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 4 committers have signed the CLA.

✅ joanagmaia
✅ themarolt
❌ ulemons
❌ mbani01
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR titles must follow Conventional Commits. Love from, Your reviewers ❤️.

@ulemons ulemons changed the base branch from main to feat/track-packages May 26, 2026 16:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a new Maven “POM fetcher” worker loop to enrich packages data in the osspckgs database by fetching Maven Central metadata/POMs, and adds corresponding data-access-layer helpers for selecting candidates and upserting packages/maintainers.

Changes:

  • Added @crowd/data-access-layer osspckgs module with queries for Maven enrichment candidates and upserts into packages, maintainers, and package_maintainers.
  • Added a pom-fetcher worker (config + entrypoint + enrichment loop) that resolves latest Maven versions and extracts POM metadata (licenses, SCM, developers/contributors).
  • Wired up scripts/deps for running the new worker (package.json scripts, docker-compose service yaml, lockfile updates).

Reviewed changes

Copilot reviewed 11 out of 13 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
services/libs/data-access-layer/src/osspckgs/types.ts Adds DB-facing types for osspckgs package/maintainer upserts and universe rows.
services/libs/data-access-layer/src/osspckgs/packages.ts Adds query to list Maven universe packages needing enrichment + upsert into packages.
services/libs/data-access-layer/src/osspckgs/maintainers.ts Adds upserts for maintainers and package_maintainers.
services/libs/data-access-layer/src/osspckgs/index.ts Re-exports osspckgs DAL surface.
services/libs/data-access-layer/src/index.ts Exposes osspckgs DAL from the package root.
services/apps/packages_worker/src/pom-fetcher/runPomEnrichmentLoop.ts Implements batch/concurrent enrichment loop and persistence of extracted metadata.
services/apps/packages_worker/src/pom-fetcher/metadata.ts Resolves latest version via maven-metadata.xml.
services/apps/packages_worker/src/pom-fetcher/extract.ts Fetches POMs and extracts fields with limited parent inheritance traversal.
services/apps/packages_worker/src/config.ts Adds pom-fetcher config loader.
services/apps/packages_worker/src/bin/pom-fetcher.ts Adds runnable entrypoint with shutdown handling.
services/apps/packages_worker/package.json Adds scripts and deps (axios, fast-xml-parser) for pom-fetcher.
scripts/services/pom-fetcher.yaml Adds docker-compose service definition for pom-fetcher.
pnpm-lock.yaml Updates lockfile for new deps (but includes an unexpected workspace importer).
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +3 to +10
export interface IDbPackageUniverse {
id: number
purl: string | null
ecosystem: string
namespace: string | null
name: string
rankInEcosystem: number | null
}
Comment on lines +40 to +44
export type IDbPackageMaintainerUpsert = {
packageId: number
maintainerId: number
role: 'author' | 'maintainer' | null
}
Comment on lines +22 to +35
SELECT
pu.id,
pu.purl,
pu.namespace,
pu.name
FROM packages_universe pu
LEFT JOIN packages p ON p.purl = pu.purl
WHERE
pu.ecosystem = 'maven'
AND pu.namespace IS NOT NULL
AND (
p.id IS NULL
OR p.last_synced_at < NOW() - ($(staleDays) || ' days')::interval
)
Comment on lines +51 to +52
export async function upsertPackage(qx: QueryExecutor, item: IDbPackageUpsert): Promise<number> {
const row = await qx.selectOne(
Comment on lines +94 to +95
return row.id as number
}
Comment on lines +123 to +127
for (const person of allPeople) {
const username = person.username ?? person.email ?? person.displayName
if (!username) continue

const emailHash = person.email
Comment on lines +211 to +215
totalProcessed += result.processed
totalSkipped += result.skipped
totalErrors += result.errors
offset += config.batchSize

Comment on lines +86 to +95
} catch (err) {
if (axios.isAxiosError(err)) {
const status = err.response?.status
if (status === 404) {
log?.(`POM not found (404): ${url}`)
return null
}
log?.(`HTTP ${status ?? 'unknown'} fetching POM: ${url}`)
return null
}
Comment on lines +33 to +39
export function getPomFetcherConfig() {
return {
batchSize: parseInt(process.env.POM_FETCHER_BATCH_SIZE ?? '200', 10),
concurrency: parseInt(process.env.POM_FETCHER_CONCURRENCY ?? '10', 10),
staleDays: parseInt(process.env.POM_FETCHER_STALE_DAYS ?? '7', 10),
idleSleepSec: parseInt(process.env.POM_FETCHER_IDLE_SLEEP_SEC ?? '3600', 10),
}
Comment thread pnpm-lock.yaml
Comment on lines +1375 to +1379
services/apps/pom_fetcher_worker:
dependencies:
'@crowd/archetype-standard':
specifier: workspace:*
version: link:../../archetypes/standard
Base automatically changed from feat/track-packages to main May 26, 2026 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants