This article will help you create a new Source Type for a source that does not exist in Sophie’s log template repository.

  • Under "Settings >> "Add and Manage Data" >> “Source Type” double check using the search bar that your desired source type indeed does not exist in Sophie.
  • Prepare a .txt file of at least 1000 example events (max: 5000). If possible, create the examples without any additional header that will wrap them (such as Syslog)
  • Make sure that the file represents the format that eventually will be sent to Sophie (i.e. if you’re going to stream IIS logs formatted as JSON, create the source type using IIS logs in this format).
  • Go to “Source Type” screen and click “Create a new source” (+New). Set the name of the source. The name should not include any spaces. Please only use letters, numbers, dashes and underscores.
  • Try to use the same naming convention that will be used in the shipper you'll be using to send the log to Sophie (e.g. if you're using Syslog as the transport, try to use the syslog tag as the name of the Source Type). This will help Sophie to automatically match the data to the correct Source Type once you start streaming it.
  • Click, “add examples” and then upload the file you prepared ahead of time.
  • Sophie will automatically start learning the examples uploaded. Once finished, review the results and make sure that:
  • All 5 labels are in place (when applicable). Please note that assigning the labels is critical for the Anomaly Detection algorithms to work properly.
  • Review the summarized view in the bottom of the screen. Make sure there are no: Errors, Timestamps failures, Severity not assigned cases
  • If there are, click on the red question mark, copy-paste them from the popup screen and paste it into the search line above and use the JS to correct it.
  • Pay attention to the Duration (in seconds) and make sure it does not exceed a reasonable time (0.05)
  • Verify there are no bad parsed fields. (e.g. the message contains parts of other fields, fields that were broken incorrectly, etc.) Here is an example of how a message should be extracted (compare the raw message to the line marked as “message”):
  • Browse 15  examples (by clicking on the arrow icons) to verify that Sophie extraction is correct and applicable for all cases (try to find different examples)
  • Browsing through the examples will also help you understand which additional fields to extract: consider if there is any type of information you believe you need to measure (Meter/Gauge/Histogram) or to use as context for the Auto Root Cause analysis (ARC Only). An example to such a field could be a name of a component (such as "cinder.db.sqlalchemy.api" in the example above) that was not extracted automatically and you want to use as a context for the anomaly detection. 
  • If so, use the JavaScript console to extract the value. Read this document for further information about fields classifications:
  • Pay attention to fields named "msg-candidate". These are fields Sophie thinks might be the actual message in the log but wasn't sure enough. 
  • If you think that the extracted field is indeed the message, click "restore" and label the field as a message.
  • Field names should be simple and intuitive, and without the prefix such as “syslog_”
  • Delete redundant/irrelevant fields
  • Run the test on 1000 lines
  • Make sure "Stacktrace Detection" is enabled when there is stacktrace logs, and disable it if there isn't.
  • Use the "Merge" syntax if you identify multiline in the data source that isn't handled properly. For example:

if (!sample.startsWith('2016')) {



  • You can also use the "Drop" function in the "structure" screen. As an example:

if (!sample.startsWith('{')) {





Did this answer your question?