We Open-Sourced Our Cloud Optimization Agents — Here's Why

Today we're releasing the core of Digital Tap AI's cloud optimization agent framework as open source. The repository is live at github.com/digital-tap, and everything you need to start optimizing your Databricks, EMR, Dataproc, and Kubernetes clusters is included.

This wasn't an obvious decision. We spent 18 months building these agents. They're the foundation of a product that saves our customers millions of dollars annually. Giving them away for free seems, on the surface, like terrible business strategy.

It's actually the best business decision we've made. Here's why.

Why Open Source?

1. Cloud Optimization Shouldn't Be a Black Box

When an autonomous agent decides to hibernate your production cluster, you need to understand why. When it migrates workloads from spot instances, you need to see the decision logic. When it right-sizes your instances, you need to verify it's making the right tradeoffs.

Closed-source optimization tools ask you to trust them with your production infrastructure based on marketing claims. Open-source tools let you read the code, understand the algorithms, and verify the behavior before you deploy.

For enterprises running critical data infrastructure, that transparency isn't a nice-to-have — it's a requirement. Security teams want to audit. Platform engineers want to understand. Compliance teams want to verify. Open source satisfies all of them.

2. The Real Value Is in the Platform, Not the Agents

The individual agents — idle detection, right-sizing, spot orchestration — are valuable, but they're not the whole product. The Digital Tap platform adds:

Agent coordination — 27 agents working together through a shared intelligence layer, making decisions that account for each other's actions
Predictive ML models — Trained on your specific usage patterns across weeks of historical data, achieving prediction accuracy that improves over time
Enterprise dashboard — Real-time visibility into savings, waste, utilization, and water impact across all platforms
Compliance and audit — Complete decision logs, approval workflows, and integration with enterprise governance tools
Managed infrastructure — We run and scale the optimization platform so you don't have to
Savings guarantee — 3-4× your subscription cost in verified savings, or a full refund

Open-sourcing the agents doesn't commoditize our product — it demonstrates its foundation. Teams that try the open-source agents and see 15-20% savings naturally want to explore what the full platform with coordinated agents and ML models can deliver (typically 35-42%).

3. The Problem Is Too Important for One Company

$44.5 billion in annual cloud waste is an industry-scale problem. Even if Digital Tap becomes the dominant optimization platform, we'll serve thousands of organizations at most. There are hundreds of thousands of companies wasting money on idle clusters.

Open-sourcing our agents means any organization — including those who would never buy a commercial product — can start reducing waste today. A startup with a $5K/month Databricks bill can deploy our idle detection agent and save $1,500/month without paying anything.

That's good for the industry, good for the planet (less wasted energy and water), and ultimately good for us — because it establishes Digital Tap as the standard for cloud optimization.

What's in the Open-Source Release

The release includes the core agent framework and five production-ready agents. Everything is Apache 2.0 licensed.

Core Framework

digital-tap/
├── core/
│   ├── agent.py          # Base agent class with lifecycle management
│   ├── scheduler.py      # Agent scheduling and coordination
│   ├── metrics.py        # Metric collection and aggregation
│   ├── config.py         # Configuration management
│   └── connectors/       # Platform connectors
│       ├── databricks.py
│       ├── emr.py
│       ├── dataproc.py
│       └── kubernetes.py
├── agents/
│   ├── idle_detection/   # Detect and hibernate idle clusters
│   ├── right_sizing/     # Analyze and recommend instance changes
│   ├── spot_manager/     # Basic spot instance lifecycle management
│   ├── cost_anomaly/     # Detect unusual cost patterns
│   └── tag_enforcer/     # Ensure cost allocation tags exist
├── cli/                  # Command-line interface
├── tests/
└── docs/

Agent 1: Idle Detection

The idle detection agent monitors cluster utilization via platform APIs and takes action when clusters sit idle beyond a configurable threshold. It supports Databricks, EMR, Dataproc, and Kubernetes.

# Install and run the idle detection agent
pip install digital-tap

# Configure your Databricks connection
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-token"

# Run the idle detection agent
digital-tap agent run idle_detection \
  --idle-threshold 10m \
  --action hibernate \
  --dry-run  # Remove for production

In dry-run mode, the agent reports what it would do without taking action — perfect for building confidence before enabling automation.

Agent 2: Right-Sizing

Analyzes actual CPU, memory, and I/O utilization over configurable time windows and generates right-sizing recommendations. Supports automated application for Databricks cluster policies.

# Run right-sizing analysis
digital-tap agent run right_sizing \
  --lookback 7d \
  --target-utilization 70 \
  --output recommendations.json

Agent 3: Spot Manager

Manages spot instance lifecycle with automatic fallback to on-demand. Monitors spot interruption warnings and handles graceful task migration. This is the community version — the full platform adds predictive interruption avoidance.

Agent 4: Cost Anomaly Detection

Uses statistical analysis to detect unusual cost patterns — a sudden spike in cluster count, an instance type change that doubles cost, or a new resource that wasn't budgeted. Sends alerts via webhook, Slack, or email.

Agent 5: Tag Enforcer

Ensures every cluster and resource has the required cost allocation tags (cost center, team, environment, project). Can warn, auto-tag with defaults, or terminate untagged resources based on policy.

Open-source agents

Platform connectors

Apache 2.0

License

How to Get Started

Getting started takes about 5 minutes:

Install the package: pip install digital-tap
Configure your platform connection (service principal for Databricks, IAM role for EMR, etc.)
Run an agent in dry-run mode to see what it would do
Review the output and adjust thresholds
Enable live mode when you're confident

The documentation at github.com/digital-tap includes quick-start guides for each platform, configuration reference, and deployment guides for running agents as Kubernetes services or cron jobs.

Contributing

We welcome contributions. The areas where we'd particularly love community input:

New platform connectors — Snowflake, BigQuery, Redshift, and other data platforms
New agents — Storage optimization, network cost analysis, reserved instance recommendations
Improved heuristics — Better idle detection for specific workload patterns
Testing — More platforms, more edge cases, more environments
Documentation — Deployment guides, best practices, case studies

Every contribution that ships in the open-source release also benefits Digital Tap platform customers — it's a positive-sum game. Contributors get recognition, the community gets better tools, and the platform gets a stronger foundation.

Open Source vs. Platform: Which Should You Use?

Here's an honest comparison:

Use the open-source agents if:

Your cloud data spend is under $50K/month
You have platform engineers who can deploy and manage the agents
You want to start with basic optimization and grow from there
You need to audit the code before trusting it with production infrastructure

Use the Digital Tap platform if:

Your spend exceeds $50K/month (the savings ROI makes the platform free)
You want coordinated multi-agent optimization with ML-powered prediction
You need enterprise features: audit logs, approval workflows, compliance
You want a savings guarantee and managed infrastructure

Many of our best customers started with the open-source agents, saw the value, and upgraded to the platform when their infrastructure grew. That's exactly the journey we designed.

"Open source isn't about giving away value. It's about proving value — transparently, verifiably, and at scale. The best products don't need to hide their code."

What's Next

This is version 1.0 of the open-source release. Over the coming months, we'll be adding:

Kubernetes-native deployment — Helm charts for running agents as K8s services
Terraform provider — Infrastructure-as-code for agent configuration
Additional agents — Storage optimization and network cost analysis
Community ML models — Pre-trained prediction models that work without the full platform

We believe cloud optimization should be accessible to every organization, regardless of size or budget. Open-sourcing our agents is our commitment to that belief. Star the repo, try the agents, and let us know what you think.

Start Optimizing Today

Try the open-source agents for free, or explore the full platform with a savings guarantee.

View on GitHub Explore the Platform →