When Sophie encounters an new type of data, a new source type will automatically be created.

  • It is important to double-check that the data has been parsed and constructed correctly.
  • Go to the Source Types and search for auto-created sources (automatically created sources will have a “robot” icon, as opposed to sources that were created manually): 
  • Click on “Structure”, and check that the following has been done:
  • All 5 labels (i.e. Message, Severity, Timestamp, etc.) are in place (when applicable). Please note that assigning the labels is critical for the Anomaly Detection algorithms to work properly.
  • Review the summarized view in the bottom of the screen. Make sure there are no: Errors, Timestamps failures, Severity Extraction Failures, etc.
  • If there are issues, click on the red question mark, copy-paste them from the popup screen and paste it into the search line above and use the JS to correct it.
  • Pay attention to the Duration (in seconds) and make sure it does not exceed a reasonable time (0.05)
  • Verify there are no poorly parsed fields. (e.g. the message contains parts of other fields, fields that were broken incorrectly, etc.) Here is an example of how a message should be extracted (compare the raw message to the line marked as “message”):
  • Browse 15  examples (by clicking on the arrow icons) to verify that Sophie extraction is correct and applicable for all cases (try to find different examples)
  • Browsing through the examples will also help you understand which additional fields to extract: consider if there is any type of information you believe you need to measure (Meter/Gauge/Histogram) or to use as context for the Auto Root Cause analysis (ARC Only). An example to such a field could be a name of a component (such as "cinder.db.sqlalchemy.api" in the example above) that was not extracted automatically and you want to use as a context for the anomaly detection. 
  • If so, use the JavaScript console to extract the value. Read this document for further information about fields classifications: https://support.loomsystems.com/loom-guides/setting-labels-and-classifications
  • Pay attention to fields named "msg-candidate". These are fields Sophie thinks might be the actual message in the log but wasn't sure enough. 

  • If you think that the extracted field is indeed the message, click "restore" and label the field as a message.
  • Field names should be simple and intuitive, and without the prefix such as “syslog_”
  • Delete redundant/irrelevant fields
  • Run the test on 1000 lines
  • Make sure "Stacktrace Detection" is enabled when there is stacktrace logs, and disable it if there are not.
  • Use the "Merge" syntax if you identify multiline in the data source that isn't handled properly. For example:

if (!sample.startsWith('2016')) {

    merge();

}

  • You can also use the "Drop" function in the "structure" screen. As an example:

if (!sample.startsWith('{')) {

    drop();

}

CLASSIFICATIONS:

Patterning:

Did this answer your question?