As mentioned in Part 1 of my blog, our project this year, titled , covers a number of areas. Yesterday, I examined improving the quality and accuracy of inventory data using machine learning and AI and today, I will look at the detection of anomalies and deviating signals across several use cases.
Event Analytics and Machine Learning
Using solutions from Federos and Arago with data analysis supported by algorithms developed by Galileo Software, the AI LEAP project focused on three areas where could be applied to drive more intelligent automation.
1. Abnormal Activity
In this use case we caused an impact to available bandwidth by performing a large data upload on a network. If we monitor network traffic in and out of an office environment, we would expect to see high traffic Monday through Friday from 9 AM to 5 PM and lower traffic for the rest of the time. However, if there is any deviation from this expected pattern we want to be notified.
In the figure below, we can see the normal daily pattern for Monday through Wednesday, but a massive spike in outbound bandwidth appears at 4 AM on Thursday. Although the bandwidth issue is detected as an abnormal activity, in this case the defined thresholds are not breached and therefore only warning severity events are generated.
The challenge with lower severity events is that they are commonly missed by operations as a) they are not major or critical severity events and therefore don’t trigger any actions and/or b) they are lost in the ‘noise’ and ignored or overlooked. However, ignoring multiple or chronic lower severity events is a risk.
So we now have multiple warning events being generated for the bandwidth anomaly. We used to monitor the abnormal threshold warnings for abnormal behavior to ensure that significant abnormal events were not missed. Event Analytics monitors for abnormal behavior across multiple instances. When an issue is detected, we can issue a trouble ticket or perform an automated action.
The solution we have developed for this use case once again demonstrates the intelligent use of data in order to help NOCs and SOCs manage their environments more effectively.
To demonstrate some of the predictive capabilities, AI LEAP ran prediction and trend analysis algorithms on the same bandwidth performance data used in the above use case to forecast over–capacity. By detecting abnormal data points outside of a calculated confidence band, automation can be used to spin–up additional resources before an issue or outage occurs. This is an especially relevant and important use case for 5G where critical services rely on guaranteed continuous low latency connectivity. Experiencing performance issues in this case is not an option.
3. AI Standards and Governance
The final AI LEAP use case investigates how to manage AI operational policy governance. We used machine learning (ML) rules to analyse data recording the number of times a machine learning job had been executed. Based on this analysis, we can detect anomalies and deviations from the expected ML processing and perform automated actions (e.g. pause execution of a ML job) if needed.
For example, using the use case I wrote about in , if we detect a potential missing link in the topology and execute the ML job 10 times in every 5 minutes and then we detect that the ML job is executed 200 times (also in a 5 minute period), this, in itself, would be an anomaly in the execution of the ML jobs and would be raised as an event to be investigated further (or perform an automated action).
In summary, we know that AI and Machine Learning has an essential role in helping organizations achieve their digital transformation initiatives. Our findings in the AI LEAP Catalyst project have demonstrated that these technologies can be used effectively to improve the quality of data, as well as to discover and exploit new data sets to manage, optimize task automation and improve customer experience.