codeintelligently
Back to posts
Code Intelligence & Analysis

Software Composition Analysis: What's in Your Codebase

Vaibhav Verma
8 min read
code-intelligencesecuritydependency-managementscasbomdevops

Software Composition Analysis: What's in Your Codebase

Last year, one of our dependencies had a critical vulnerability published on a Friday afternoon. CVE-2025-32751, a remote code execution flaw in a JSON parsing library. The fix took us 4 hours. But the real question was harder: how many of our 847 transitive dependencies were affected? We didn't know, because we'd never systematically inventoried what was in our codebase. That weekend I set up our first Software Composition Analysis (SCA) pipeline, and what we found was alarming.

34% of our dependencies hadn't been updated in over 18 months. 11 had known vulnerabilities, 3 of them rated high severity. 2 were abandoned projects with no maintainer. And one dependency, pulled in transitively by a testing library, had a license that was incompatible with our commercial use.

This is the state of most codebases. Not because teams are careless, but because nobody's watching. SCA is how you start watching.

What Software Composition Analysis Actually Does

SCA tools scan your codebase to build a complete inventory of every open-source component, direct or transitive. Then they cross-reference that inventory against vulnerability databases (NVD, GitHub Advisory Database, OSV), license databases, and maintenance health indicators.

The output is three things:

  1. A Software Bill of Materials (SBOM): A complete list of every component in your application, with versions.
  2. Vulnerability alerts: Known CVEs affecting your dependencies, with severity ratings and fix availability.
  3. License compliance report: What licenses govern your dependencies and whether any conflict with your distribution model.

This isn't optional anymore. Executive Order 14028 (from 2021) requires SBOMs for software sold to the US federal government. The EU Cyber Resilience Act extends similar requirements to the European market starting in 2027. Even if you don't sell to government, your enterprise customers are increasingly requesting SBOMs as part of vendor security assessments.

The Numbers That Should Scare You

I've run SCA scans across 8 production codebases over the past 2 years. Here's what I typically find:

  • Average direct dependencies: 87 (Node.js), 43 (Python), 62 (Go)
  • Average transitive dependencies: 634 (Node.js), 187 (Python), 118 (Go)
  • Dependencies with known vulnerabilities: 4-12% of total
  • Dependencies with no update in 24+ months: 18-31%
  • License conflicts found: At least 1 per codebase

The Node.js numbers are the scariest. A typical Next.js application pulls in 800-1,200 transitive dependencies. That's 800+ pieces of code you didn't write, didn't review, and often don't know exist, all running in your production environment.

Building an SCA Pipeline

Here's how I set up SCA from scratch. The total implementation time is about 2-3 days for the initial setup, plus an ongoing 2-4 hours per week for triage.

Step 1: Choose Your Scanner

The SCA market has matured significantly. Here's my honest assessment of the options:

Open source:

  • npm audit / pip audit / govulncheck: Free, built-in, catches the obvious stuff. Miss transitive vulnerabilities, don't do license analysis, and have high false-positive rates. Fine as a starting point.
  • OWASP Dependency-Check: Free, covers multiple languages, integrates with CI. More thorough than built-in tools but slower and requires configuration.
  • Trivy: My recommendation for teams starting out. Free, fast, covers vulnerabilities and some license checking, supports containers and IaC in addition to dependencies.

Commercial:

  • Snyk: Best developer experience. Inline PR suggestions, good IDE integration. $25-50/developer/month.
  • Sonatype Nexus: Best for enterprise governance. Deep license analysis, policy engine. Pricing varies by org size.
  • Mend (formerly WhiteSource): Strong on license compliance. Good for companies with strict IP requirements.

For most teams under 50 engineers, I recommend starting with Trivy (free) and graduating to Snyk if you need the developer workflow integration.

Step 2: Integrate Into CI

yaml
# .github/workflows/sca.yml
name: Software Composition Analysis
on:
  pull_request:
  schedule:
    - cron: "0 6 * * 1" # weekly Monday scan

jobs:
  vulnerability-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: "fs"
          scan-ref: "."
          severity: "HIGH,CRITICAL"
          exit-code: "1" # fail the build on high/critical
          format: "sarif"
          output: "trivy-results.sarif"
      - name: Upload results to GitHub Security
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: "trivy-results.sarif"

  license-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npx license-checker --production --failOn "GPL-3.0;AGPL-3.0"

Step 3: Establish a Dependency Policy

This is where most teams stop, and it's why most SCA implementations fail. Running the scanner is easy. Knowing what to do with the results is hard.

The dependency policy I use:

Vulnerability response SLAs:

  • Critical (CVSS 9.0+): Fix or mitigate within 48 hours
  • High (CVSS 7.0-8.9): Fix within 1 sprint (2 weeks)
  • Medium (CVSS 4.0-6.9): Fix within 1 month
  • Low (CVSS <4.0): Fix during scheduled maintenance

New dependency approval criteria:

  • Must have >500 GitHub stars or be from a recognized organization
  • Must have a commit within the last 6 months
  • Must use an approved license (MIT, Apache 2.0, BSD, ISC)
  • Must not have any unpatched high/critical CVEs

Dependency update cadence:

  • Security patches: applied within SLA above
  • Minor versions: updated monthly
  • Major versions: evaluated quarterly, updated when justified

The Contrarian Take

Most SCA advice says "update everything, all the time." I think that's wrong and actively counterproductive. Chasing every minor version update creates noise, introduces upgrade risk, and burns engineering time on work that delivers zero user value.

Here's my approach: I only update dependencies for three reasons. First, security vulnerabilities (following the SLA above). Second, when we need a feature from a newer version. Third, when the current version is approaching end-of-life. That's it.

I've seen teams spend 20% of their sprint capacity on dependency updates that weren't security-related and didn't unlock any new functionality. That's a full day per engineer per week spent on gardening that doesn't make the product better. Pin your versions, monitor for security issues, and update only when there's a concrete reason.

The Stealable Framework: Monthly SCA Review

Schedule a 1-hour monthly meeting with this agenda:

First 15 minutes: Dashboard Review

  • Total dependencies (trending up? Question new additions)
  • Open vulnerability count by severity
  • Average age of dependencies

Next 20 minutes: Critical Items

  • Any new high/critical CVEs since last review
  • Dependencies approaching end-of-life
  • License issues

Next 15 minutes: Policy Compliance

  • Were all vulnerability SLAs met last month?
  • Were any unapproved dependencies added?
  • Any dependency requests pending review

Last 10 minutes: Action Items

  • Assign owners for each open item
  • Set deadlines
  • Update the dependency policy if needed

After 6 months of running this cadence, our mean time to remediate critical vulnerabilities dropped from 11 days to 2.3 days. Not because we got faster at patching, but because we caught issues earlier and had a process for prioritizing them.

Your codebase is mostly other people's code. Knowing what's in it, whether it's safe, and whether it's maintained isn't paranoia. It's basic operational hygiene.

$ ls ./related

Explore by topic