TelcoNews US - Telecommunications news for ICT decision-makers
United States
Virtana launches AI observability for Dell factories

Virtana launches AI observability for Dell factories

Thu, 14th May 2026 (Today)
Sofiah Nichole Salivio
SOFIAH NICHOLE SALIVIO News Editor

Virtana has launched AI Factory Observability for Dell AI Factory environments and published survey findings suggesting many US enterprises lack system-level visibility into AI infrastructure.

The new offering gives customers a single view across Dell AI Factory deployments, linking GPU performance, storage, networking and workload activity. It covers Dell PowerEdge compute, PowerScale and ObjectScale storage, networking fabrics including InfiniBand, Ethernet and NVLink, and Dell's Smart Fabric Manager orchestration layer.

The announcement coincided with research based on a survey of 788 US enterprise IT decision-makers. The findings show that 59% of respondents cannot automatically identify root cause across infrastructure domains when an AI workload alert fires, while 66% operate AI infrastructure without reliable performance baselines.

Another 35% identified GPU cost and utilisation as their most difficult operational challenge. The results suggest a gap between the pace of AI deployment and the tools companies use to monitor and manage these systems.

Survey findings

The study indicates that enterprise AI use has moved beyond pilot projects for many organisations. Some 54% of respondents said they are already scaling AI across teams, while a further 23% said they are managing production workloads and expanding infrastructure.

At the same time, many reported holding back on related operational spending. The survey found that 56% are deferring legacy infrastructure modernisation and 54% are deprioritising cost-optimisation initiatives.

Cost pressures also appear to be reshaping infrastructure decisions. Eighty per cent of respondents said the cost of premium AI hardware is changing how they design infrastructure, with 60% shifting workloads across hybrid environments and 58% accelerating consolidation to improve efficiency.

Virtana also pointed to what it described as a disconnect between technical teams and senior management. It cited a 17-point gap between infrastructure and site reliability engineering practitioners and executives on automated root-cause capabilities: 69% of Infra/SRE teams reported a lack of automated cross-domain root cause analysis, compared with 52% of executives.

Operational gaps

The research suggests many companies still lack a clear picture of how AI systems behave in production. Only 34% of respondents described AI workload performance as highly predictable, and that figure fell to 25% at organisations with more than 50,000 employees.

A quarter of those surveyed said manual investigation across disconnected consoles remains their first response when an alert is triggered. The survey also found that 57% cited cost and efficiency metrics as a top challenge, 56% cited GPU utilisation tracking, and 52% cited data pipeline visibility.

Across all roles and revenue bands, 38% said they need unified visibility across AI and infrastructure layers, while 32% said they need AI-driven root cause analysis without manual correlation.

Paul Appleby, chief executive officer of Virtana, said the issue is becoming more important as AI systems take on a larger role in large companies.

"Modern enterprises, including banks, telcos, insurers and airlines, are increasingly dependent on AI-driven services. As a result, one of the greatest risks to the business is any disruption across these AI systems, where failures across applications or underlying infrastructure directly translate into business impact," Appleby said.

"AI systems function as interconnected systems, where infrastructure, data pipelines, token consumption, and model behavior continuously influence outcomes. Yet most organizations still monitor these elements in silos. Without system-wide understanding of these dependencies, they cannot explain how outcomes are produced, control cost, or determine whether those outcomes can be trusted," he added.

Dell integration

Virtana said its Dell AI Factory integration is intended to address that problem by correlating data across the full infrastructure stack rather than monitoring each layer separately. This includes node-level hardware telemetry from iDRAC, workload placement information through Smart Fabric Manager, and links between model behaviour, token usage and infrastructure consumption.

The system is designed to identify bottlenecks such as storage latency, fabric congestion, GPU contention and misallocated capacity. The aim is to help operators determine whether a slowdown or failure stems from hardware health, orchestration, networking or the workload itself.

Appleby said many organisations are making changes to live AI systems without fully understanding the effect on cost, resilience or output.

"Without system-level observability, organizations cannot determine how these changes affect outcomes, cost, or reliability. As a result, they are continuously optimizing AI systems they do not fully understand, introducing risk with every change," he said.

He added that the divide between companies that understand their AI systems and those that do not is becoming clearer.

"These are not abstract concerns. As AI becomes core enterprise infrastructure, a clear divide is emerging between organizations that understand how their systems produce outcomes and those that cannot explain or control them. Without visibility across models, tokens, GPUs, and infrastructure, teams absorb hidden cost, performance gaps, and ungoverned risk. Those that understand their systems gain end-to-end visibility and control so they can optimize cost in real time, ensure reliable performance, and prove outcomes. The result is declining resilience, eroding trust, and constrained growth as AI becomes infrastructure that must be governed and optimized at scale," Appleby said.

Amitkumar Rathi, Chief Product Officer of Virtana, said: "AI workloads at scale are complex by nature; they span GPUs, storage, networking, and orchestration. Performance depends on how all of those layers interact."