Skip to main contentIBM  AIOps Training

Introduce the demo context

📣 Narration

Welcome to this demonstration of the CloudPak for AIOps platform. In this demo, I am going to show you how CloudPak for AIOps can help your operations team proactively identify, diagnose, and resolve incidents across mission-critical workloads.

You’ll see how:

  • CloudPak for AIOps intelligently correlates multiple disparate sources of information such as logs, metrics, events, tickets and topology
  • All of this information is condensed and presented in actionable alerts instead of large quantities of unrelated alerts
  • You can resolve a problem within seconds to minutes of being notified using CloudPak for AIOps’ automation capabilities

During the demonstration, we will be using the sample application called RobotShop, which serves as a proxy for any type of app. The application is built on a microservices architecture, and the services are running on Kubernetes cluster.

🚀 Action Use demo introductory PowerPoint presentation, to illustrate the narration. Adapt your details on Slide 1 and 13

📣 Narration

Slide 2: Let’ look at the environment that we have set up. Our sample application: “RobotShop” is running as a set of microservices in a Kubernetes cluster. Typically, the Operations team maintaining such application has a collection of tools through which they collect various data types.

Slide 3: Here we have several systems that are sending Events into AIOPS (slide 3), like:

  • GitHub
  • Turbonomic
  • Instana
  • Selenium
  • Falcon (Sysdig)

Those Events are being grouped into Alerts to massively reduce the number of signals that have to be treated. We usually observe a ratio of about 98-99% of reduction. This means that out of 20’000 events we get about 200-300 Alerts that can be further prioritised.

Slide 4: AIOPS also ingests Logs from ElasticSearch (this could be Splunk or other Log Aggregators). The Log Anomaly detection is trained on a well running system and is able to detect anomalies and outliers. If an Anomaly is detected it will be grouped with the other Events.

Slide 5: AIOPS also ingests Metrics from Instana (this could be Dynatrace, NewRelic or others). The Metric Anomaly detection is trained on a well running system and creates dynamic baselines. Through different algorithms it is able to detect anomalies and outliers. If an Anomaly is detected it will also be grouped with the other Events.

Slide 6: Alerts that are relevant for the same Incident are packaged into a so called Story. The Story will be enriched and updated with information as it gets available.

Slide 7: One example is the Topology information. Not only will AIOPS tell me that I have a problem and present all relevant Events but it will also tell me where in the system topology the problem is situated.

Slide 8: Furthermore the Story is enriched with past resolution information coming from ServiceNow tickets. I’ll explain this more in detail during the demo.

Slide 9: The Stories can either be examined in the AIOPS web interface or can be pushed to Slack or Teams if your teams are using a ChatOps approach.

Slide 10: If Operations or SREs have created Runbooks, AIOPS can automatically trigger a Runbook to mitigate the problem.

ℹ️ Note: We are NOT using Slack in this demo.

📣 Narration

Now let’s start the demo.

Page last updated: 03 November 2022