What Is Message Clustering?
Message Clustering is the process of grouping together logs and events that might contain different content but originated from the same context.
Why Cluster Messages?
Take the following example of two similar-but-very-different scenarios:
The ability to group together these log lines, then analyze the group’s behavior, enables Sophie to detect abnormalities with the group – or some sub-context of that group’s – behavior over time.
How Does Sophie Cluster Messages?
Sophie uses several different algorithms to cluster messages:
Using External IDs
The first algorithm relies on an external ID. Such Message-ID is present for example in Microsoft Eventlog (EventID), in Linux Journal logs (MSGID), in Oracle Weblogic (BEA) and in many other log sources.
If such an ID is present, Sophie will use it as the “Cluster ID”. Easy!
This is a very straightforward algorithm – execute a bunch of regular expressions against the log line to replace all the varying parts with placeholders, then check if the result has been seen before.
See the following example:
In this example, we see an example log and the resulting “pattern” with different parts of the example replaced.
All log lines that share the same resulting-pattern are considered to be of the same Cluster-ID.
Resemblance Algorithm (GBP)
The resemblance algorithm is a proprietary algorithm that combines graph-based and Levenshtein-distance-based approaches to clustering messages.
This is a complex algorithm and is beyond the scope of this document. Contact us if you’d like to know more about it!
Why Should I Improve the Clustering Accuracy?
The clustering will sometime fail on particular log messages. This often happens with logs containing a large portion of free-text – e.g. when a log line contains printing of tweets or emails. In these cases, Sophie will create many new clusters – although such clusters will quickly be deleted as Sophie knows to recognize false clusters.
Hence, the motivation behind improving the accuracy is really split to two:
a) The high rate of false-clusters creation slows down Sophie.When this happens, you will see a warning message in the web-app, similar to this one:
b) When a log isn’t clustered properly it prevents Sophie from detecting anomalies of that log line and its properties
How Do I Improve the Clustering Accuracy?
Start by identifying the offending log lines. In the web-app, go to Settings→Administration→Diagnostics
Under the Patterns tab, you will see the most recently identified patterns. Experiment with the different sorting options until you identify a log line with some part that is changing – i.e. should have been replaced with a placeholder. Consider the following example:
It’s easy to see that there is a hash that isn’t being replaced properly.
Once we identified the offending pattern, we’ll head to the Source-Type that is responsible for handling it, and open its Clusters settings:
In the Clusters page, you will have the option to:
1) Test the “patterning” of some text
2) Add Regular Expressions that will replace the problematic text with some placeholder
Add a new “replacement” by clicking on the “New” button:
After saving your example, you can test different messages to see that your replacement works as expected.