When Sophie alerts with "Sophie is running low on disk, data ingestion will soon be stopped. Follow the Data Management guide or contact email@example.com" This means disk-space doesn’t meet your current utilization.
Administrators need to configure optimal utilization of the available disk space.
Configure Optimal Utilization of the Available Disk Space
If Sophie is running on low disk, perform the following steps:
Verify Direct-Attached Storage
First, make sure you're running on a locally-attached disk. Running with spinning disks, or even SAN, has a severe impact on performance.
To verify whether data spins run the following command from the terminal:
If fs.data.spins is true, then data, indeed, spin.
Another option to verify this is by running: cat /sys/block/sde/queue/rotational(replace sde with the disk holding the Elastic data)
1 = spinning (bad)
0 = SSD (good)
Steps to Improve Throughput and Disk Usage
Remove data you don't need:
At your Sophie instance, go to Settings->General->Storage. Check the "heaviest" sources, for each "heavy" source:
- Review their structure - are there any properties you can remove? Decreasing the number of propertied in the stored-events has the biggest impact.
- Do you need the rawMessage property?It's used in notifications, for free-text correlations, and it is somewhat helpful in Kibana. But it doubles the size of a document. If the structuring is good, remove this field by going to the source-settings, then set elastic.store_raw_event to false.
- First, try to be selective with what you drop. For example, you might prefer to drop low-severity events in the data-input.
- Consider sub-sampling (i.e. taking one every X events). This can be controlled per-source via the elasticsearch.subsampling_ratio setting.
Steps to improve the throughput
Optimize the bulk-indexing interval
Adjust the following general-settings:
Use the operational-dashboard to measure the effect, the objective is to get the documents.write metric as high as possible. Note that at some point, you might start seeing errors in the Elastic logs - which means you're bombarding it with more than it can take.
Keep indices size at no more than the size of the RAM allocated to Elastic
e.g. if Elastic has 30GB RAM, keep your indices smaller than that.
If some of your indices grow larger, then consider either:
- Increasing the number of shards (even if working with a single instance).
- change the index rotation to be hourly instead of daily.
Both of these settings can be found under the source-setting (elasticsearch.number_of_shards and elasticsearch.index_time_interval). Increasing the number of shards is almost always better, but if the daily volume of a source is more than x20 the size of the memory allocated to Elastic, switch to hourly indices.
Remove/disable read-heavy modules
The heavy readers are:
- Custom Alerts (especially ones querying event-* or with lengthy periods)
- ARCA modules (entity analysis, highlight analysis) Also, each alert creation involves querying Elastic, so make sure you're not generating too many alerts (a small number of incidents might be misleading. Check the number of alerts-per-incident). If there are many hundreds of daily alerts, consider tweaking the anomaly-detection engine.
Steps to reduce disk usage
Compress large indices
Under source-settings, change elasticsearch.index_codec to be best_compression. Note that this will only take effect for new indices.
Assessing disk performance
There are several ways for doing this, but the recommended one is to run iostat -xdwhile the system is running. Check the disk that is running Elastic, and look for ther_await and w_await columns. Decent values are up to very few milliseconds. Ten milliseconds or more means the disk is too slow.