← Back to Blog

We Open-Sourced Our Cloud Optimization Agents — Here's Why

The story behind our decision to open-source the core agent framework, what's included, and how you can start using it today.

Today we're releasing the core of Digital Tap AI's cloud optimization agent framework as open source. The repository is live at github.com/digital-tap, and everything you need to start optimizing your Databricks, EMR, Dataproc, and Kubernetes clusters is included.

This wasn't an obvious decision. We spent 18 months building these agents. They're the foundation of a product that saves our customers millions of dollars annually. Giving them away for free seems, on the surface, like terrible business strategy.

It's actually the best business decision we've made. Here's why.

Why Open Source?

1. Cloud Optimization Shouldn't Be a Black Box

When an autonomous agent decides to hibernate your production cluster, you need to understand why. When it migrates workloads from spot instances, you need to see the decision logic. When it right-sizes your instances, you need to verify it's making the right tradeoffs.

Closed-source optimization tools ask you to trust them with your production infrastructure based on marketing claims. Open-source tools let you read the code, understand the algorithms, and verify the behavior before you deploy.

For enterprises running critical data infrastructure, that transparency isn't a nice-to-have — it's a requirement. Security teams want to audit. Platform engineers want to understand. Compliance teams want to verify. Open source satisfies all of them.

2. The Real Value Is in the Platform, Not the Agents

The individual agents — idle detection, right-sizing, spot orchestration — are valuable, but they're not the whole product. The Digital Tap platform adds:

Open-sourcing the agents doesn't commoditize our product — it demonstrates its foundation. Teams that try the open-source agents and see 15-20% savings naturally want to explore what the full platform with coordinated agents and ML models can deliver (typically 35-42%).

3. The Problem Is Too Important for One Company

$44.5 billion in annual cloud waste is an industry-scale problem. Even if Digital Tap becomes the dominant optimization platform, we'll serve thousands of organizations at most. There are hundreds of thousands of companies wasting money on idle clusters.

Open-sourcing our agents means any organization — including those who would never buy a commercial product — can start reducing waste today. A startup with a $5K/month Databricks bill can deploy our idle detection agent and save $1,500/month without paying anything.

That's good for the industry, good for the planet (less wasted energy and water), and ultimately good for us — because it establishes Digital Tap as the standard for cloud optimization.

What's in the Open-Source Release

The release includes the core agent framework and five production-ready agents. Everything is Apache 2.0 licensed.

Core Framework

digital-tap/
├── core/
│   ├── agent.py          # Base agent class with lifecycle management
│   ├── scheduler.py      # Agent scheduling and coordination
│   ├── metrics.py        # Metric collection and aggregation
│   ├── config.py         # Configuration management
│   └── connectors/       # Platform connectors
│       ├── databricks.py
│       ├── emr.py
│       ├── dataproc.py
│       └── kubernetes.py
├── agents/
│   ├── idle_detection/   # Detect and hibernate idle clusters
│   ├── right_sizing/     # Analyze and recommend instance changes
│   ├── spot_manager/     # Basic spot instance lifecycle management
│   ├── cost_anomaly/     # Detect unusual cost patterns
│   └── tag_enforcer/     # Ensure cost allocation tags exist
├── cli/                  # Command-line interface
├── tests/
└── docs/

Agent 1: Idle Detection

The idle detection agent monitors cluster utilization via platform APIs and takes action when clusters sit idle beyond a configurable threshold. It supports Databricks, EMR, Dataproc, and Kubernetes.

# Install and run the idle detection agent
pip install digital-tap

# Configure your Databricks connection
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-token"

# Run the idle detection agent
digital-tap agent run idle_detection \
  --idle-threshold 10m \
  --action hibernate \
  --dry-run  # Remove for production

In dry-run mode, the agent reports what it would do without taking action — perfect for building confidence before enabling automation.

Agent 2: Right-Sizing

Analyzes actual CPU, memory, and I/O utilization over configurable time windows and generates right-sizing recommendations. Supports automated application for Databricks cluster policies.

# Run right-sizing analysis
digital-tap agent run right_sizing \
  --lookback 7d \
  --target-utilization 70 \
  --output recommendations.json

Agent 3: Spot Manager

Manages spot instance lifecycle with automatic fallback to on-demand. Monitors spot interruption warnings and handles graceful task migration. This is the community version — the full platform adds predictive interruption avoidance.

Agent 4: Cost Anomaly Detection

Uses statistical analysis to detect unusual cost patterns — a sudden spike in cluster count, an instance type change that doubles cost, or a new resource that wasn't budgeted. Sends alerts via webhook, Slack, or email.

Agent 5: Tag Enforcer

Ensures every cluster and resource has the required cost allocation tags (cost center, team, environment, project). Can warn, auto-tag with defaults, or terminate untagged resources based on policy.

5
Open-source agents
4
Platform connectors
Apache 2.0
License

How to Get Started

Getting started takes about 5 minutes:

  1. Install the package: pip install digital-tap
  2. Configure your platform connection (service principal for Databricks, IAM role for EMR, etc.)
  3. Run an agent in dry-run mode to see what it would do
  4. Review the output and adjust thresholds
  5. Enable live mode when you're confident

The documentation at github.com/digital-tap includes quick-start guides for each platform, configuration reference, and deployment guides for running agents as Kubernetes services or cron jobs.

Contributing

We welcome contributions. The areas where we'd particularly love community input:

Every contribution that ships in the open-source release also benefits Digital Tap platform customers — it's a positive-sum game. Contributors get recognition, the community gets better tools, and the platform gets a stronger foundation.

Open Source vs. Platform: Which Should You Use?

Here's an honest comparison:

Use the open-source agents if:

Use the Digital Tap platform if:

Many of our best customers started with the open-source agents, saw the value, and upgraded to the platform when their infrastructure grew. That's exactly the journey we designed.

"Open source isn't about giving away value. It's about proving value — transparently, verifiably, and at scale. The best products don't need to hide their code."

What's Next

This is version 1.0 of the open-source release. Over the coming months, we'll be adding:

We believe cloud optimization should be accessible to every organization, regardless of size or budget. Open-sourcing our agents is our commitment to that belief. Star the repo, try the agents, and let us know what you think.

Start Optimizing Today

Try the open-source agents for free, or explore the full platform with a savings guarantee.