Feat/pom fetcher#4144
Conversation
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
…dRepos) Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
…plicitly Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
…ved to packages_worker) Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
|
|
There was a problem hiding this comment.
PR titles must follow Conventional Commits. Love from, Your reviewers ❤️.
There was a problem hiding this comment.
Pull request overview
Introduces a new Maven “POM fetcher” worker loop to enrich packages data in the osspckgs database by fetching Maven Central metadata/POMs, and adds corresponding data-access-layer helpers for selecting candidates and upserting packages/maintainers.
Changes:
- Added
@crowd/data-access-layerosspckgs module with queries for Maven enrichment candidates and upserts intopackages,maintainers, andpackage_maintainers. - Added a
pom-fetcherworker (config + entrypoint + enrichment loop) that resolves latest Maven versions and extracts POM metadata (licenses, SCM, developers/contributors). - Wired up scripts/deps for running the new worker (package.json scripts, docker-compose service yaml, lockfile updates).
Reviewed changes
Copilot reviewed 11 out of 13 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| services/libs/data-access-layer/src/osspckgs/types.ts | Adds DB-facing types for osspckgs package/maintainer upserts and universe rows. |
| services/libs/data-access-layer/src/osspckgs/packages.ts | Adds query to list Maven universe packages needing enrichment + upsert into packages. |
| services/libs/data-access-layer/src/osspckgs/maintainers.ts | Adds upserts for maintainers and package_maintainers. |
| services/libs/data-access-layer/src/osspckgs/index.ts | Re-exports osspckgs DAL surface. |
| services/libs/data-access-layer/src/index.ts | Exposes osspckgs DAL from the package root. |
| services/apps/packages_worker/src/pom-fetcher/runPomEnrichmentLoop.ts | Implements batch/concurrent enrichment loop and persistence of extracted metadata. |
| services/apps/packages_worker/src/pom-fetcher/metadata.ts | Resolves latest version via maven-metadata.xml. |
| services/apps/packages_worker/src/pom-fetcher/extract.ts | Fetches POMs and extracts fields with limited parent inheritance traversal. |
| services/apps/packages_worker/src/config.ts | Adds pom-fetcher config loader. |
| services/apps/packages_worker/src/bin/pom-fetcher.ts | Adds runnable entrypoint with shutdown handling. |
| services/apps/packages_worker/package.json | Adds scripts and deps (axios, fast-xml-parser) for pom-fetcher. |
| scripts/services/pom-fetcher.yaml | Adds docker-compose service definition for pom-fetcher. |
| pnpm-lock.yaml | Updates lockfile for new deps (but includes an unexpected workspace importer). |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| export interface IDbPackageUniverse { | ||
| id: number | ||
| purl: string | null | ||
| ecosystem: string | ||
| namespace: string | null | ||
| name: string | ||
| rankInEcosystem: number | null | ||
| } |
| export type IDbPackageMaintainerUpsert = { | ||
| packageId: number | ||
| maintainerId: number | ||
| role: 'author' | 'maintainer' | null | ||
| } |
| SELECT | ||
| pu.id, | ||
| pu.purl, | ||
| pu.namespace, | ||
| pu.name | ||
| FROM packages_universe pu | ||
| LEFT JOIN packages p ON p.purl = pu.purl | ||
| WHERE | ||
| pu.ecosystem = 'maven' | ||
| AND pu.namespace IS NOT NULL | ||
| AND ( | ||
| p.id IS NULL | ||
| OR p.last_synced_at < NOW() - ($(staleDays) || ' days')::interval | ||
| ) |
| export async function upsertPackage(qx: QueryExecutor, item: IDbPackageUpsert): Promise<number> { | ||
| const row = await qx.selectOne( |
| return row.id as number | ||
| } |
| for (const person of allPeople) { | ||
| const username = person.username ?? person.email ?? person.displayName | ||
| if (!username) continue | ||
|
|
||
| const emailHash = person.email |
| totalProcessed += result.processed | ||
| totalSkipped += result.skipped | ||
| totalErrors += result.errors | ||
| offset += config.batchSize | ||
|
|
| } catch (err) { | ||
| if (axios.isAxiosError(err)) { | ||
| const status = err.response?.status | ||
| if (status === 404) { | ||
| log?.(`POM not found (404): ${url}`) | ||
| return null | ||
| } | ||
| log?.(`HTTP ${status ?? 'unknown'} fetching POM: ${url}`) | ||
| return null | ||
| } |
| export function getPomFetcherConfig() { | ||
| return { | ||
| batchSize: parseInt(process.env.POM_FETCHER_BATCH_SIZE ?? '200', 10), | ||
| concurrency: parseInt(process.env.POM_FETCHER_CONCURRENCY ?? '10', 10), | ||
| staleDays: parseInt(process.env.POM_FETCHER_STALE_DAYS ?? '7', 10), | ||
| idleSleepSec: parseInt(process.env.POM_FETCHER_IDLE_SLEEP_SEC ?? '3600', 10), | ||
| } |
| services/apps/pom_fetcher_worker: | ||
| dependencies: | ||
| '@crowd/archetype-standard': | ||
| specifier: workspace:* | ||
| version: link:../../archetypes/standard |
Summary
Changes
Type of change
JIRA ticket