28 Essential IT Automation Examples for Proactive Monitoring, Alerting, and Remediation

MSP Monitoring

Endpoint monitoring and alerting is a central part of IT management. Good monitoring and alerting practices enable you to proactively identify issues, resolve them faster, and save you and your users time and frustration further down the line.

The challenge is knowing what to monitor for, what requires an alert, which issues can be automatically resolved, and which need a personal touch. That knowledge can take years to develop, and even then the best IT teams can still struggle with reducing alert fatigue and ticket noise across their networks and devices.

To help condense that ramp-up time and narrow your focus, we’ve put together this list of ideas for conditions to monitor for, along with suggested triggers and actions for automation. These are based on recommendations from our customers, and from NinjaOne’s experience helping IT teams build more effective, actionable monitoring.

What to Monitor and Alert On: How to Use the Checklists Below

For each condition, we describe what is being monitored, how to set up the monitor in NinjaOne, and what actions should be taken if the condition is triggered. Some monitoring suggestions are concrete while others may require a small amount of customization to fit them to your use case.

Note: While we’ve written this checklist with NinjaOne and our customers in mind, these monitoring ideas should be easily adaptable to any RMM or endpoint management solution.

This list is also obviously not exhaustive, and may not apply to every situation or circumstance.

Once you’ve gotten started building out your monitoring around these suggestions, you’ll want to develop a more customized and robust monitoring strategy specific to your needs. We’ll close out this post with additional recommendations to help with that effort and make monitoring, alerting, and ticketing more streamlined and effective.

Device Health Monitoring

Device health monitoring checklist

Monitor for continuous critical events

  • Condition: Critical Events
  • Threshold: 80 critical events over 5 minutes
  • Action: Ticket and investigate

Identify when a device is unintentionally rebooted

  • Condition: Windows Event
  • Event Source: Microsoft-Windows-Kernel-Power
  • Event ID: 41
  • Note: This condition is better suited for servers as workstations and laptops can create this error from user intervention.
  • Action: Ticket and investigate

Identify devices in need of a reboot

  • Condition: System Uptime
  • Threshold recommendation: 30 or 60 days
  • Action: Restart the device during an appropriate window. Automated remediation may work for workstations.

Monitor for offline endpoints

  • Condition: Device Down
  • Threshold recommendation:
    • 10 minutes or less (servers).
    • 24+ hours (workstations)
  • Action:
    • Ticket and investigate
    • Wake-on-lan (servers only)

Monitor for hardware changes

  • Activity: System
  • Name: Adapter added/changed, CPU added/removed, Disk drive added/removed, Memory added/removed
  • Action: Ticket and investigate

Drive Monitoring

Drive monitoring checklist

Monitor for potential disk failure

  • Condition: Windows SMART Status Degraded
    and/or
  • Condition: Windows Event
  • Event Source: Disk
  • Event IDs: 7, 11, 29, 41, 51, 153
  • Action: Ticket and investigate

Identify when disk space is approaching capacity

  • Condition: Disk Free Space
  • Threshold: 20% and again at 10%
  • Action: Perform disk cleanup and delete temporary files

Monitor for potential RAID failures

  • Condition: RAID Health Status
  • Thresholds: Critical and Non-Critical for all attributes
  • Action: Ticket and investigate

Monitor for prolonged high disk usage

  • Condition: Disk Usage
  • Thresholds: 90% or greater to reduce noise, with 95%+ also being common over 30 or 60-minute periods
  • Action: Ticket and investigate

Monitor for high disk activity rate

  • Condition: Disk Active Time
  • Thresholds: Greater than 90% for 15 minutes
  • Action: Ticket and investigate

Monitor for high memory usage

  • Condition: Disk Active Time
  • Thresholds: Greater than 90% for 15 minutes
  • Action: Ticket and investigate

ready to become an IT Ninja banner

Application Monitoring

Application monitoring checklist

Identify if required applications exist on an endpoint

  • Condition: Software
  • Usage:
    • Client line-of-business applications (Examples: AutoCAD, SAP, Photoshop)
    • Client productivity solutions (Examples: Zoom, Microsoft Teams, DropBox, Slack, Office, Acrobat)
    • Client support tools (Examples: TeamViewer, CCleaner, AutoElevate, BleachBit)
  • Action: Automatically install the application if it is missing and required

Monitor whether critical applications are running (particularly for servers)

  • Condition: Process / Service
  • Threshold: Down for at least 3 minutes
  • Example Processes:
    • For workstations: TeamViewer, RDP, DLP
    • For an Exchange server: MSExchangeServiceHost, MSExchangeIMAP4, MSExchangePOP3, etc
    • For an Active Directory server: Netlogon, dnscache, rpcss, etc
    • For a SQL server: mssqlserver, sqlbrowser, sqlwriter, etc
  • Action: Restart the service or process

Monitor resource usage for applications known to cause performance issues

  • Condition: Process Resource
  • Threshold: 90%+ for at least 5 minutes
  • Example Processes: Outlook, Chrome, and TeamViewer
  • Action:
    • Ticket and investigate
    • Disable at startup

Monitor for application crashes

  • Condition: Windows Event
  • Source: Application Hang
  • Event ID: 1002
  • Action: Ticket and investigate

 

Network Monitoring

Network monitoring checklist

Monitor for unexpected bandwidth usage

  • Condition: Network Utilization
  • Direction: Out
  • Threshold: thresholds will be determined by the type of endpoint and network capacity
    • Each server should have its own threshold based on its use case
    • Workstation network monitor thresholds should be high enough to trigger only when a client’s network is at risk
  • Action: Ticket and investigate

Ensure network devices are up

  • Condition: Device Down
  • Duration: 3 Minutes

Monitor which ports are open

  • Condition: Cloud monitor
  • Ports: 80 (HTTP), 443 (HTTPS), 25 (SMTP), 21 (FTP)

Monitor client website availability

  • Monitor: Ping
  • Target: Client Website
  • Condition: Failure (5 times)
  • Action: Ticket and investigate

 

Basic Security Monitoring

Basic security monitoring checklist

Identify if Windows Firewall has been turned off

  • Condition: Windows Event
  • Event Source: System
  • Event ID: 5025
  • Action: Turn on Windows Firewall

Identify if antivirus and security tools are installed and/or running on an endpoint

  • Condition: Software
  • Presence: Doesn’t Exist
  • Software (examples): Huntress, Cylance, Threatlocker, Sophos
  • Action: Automate the installation of the missing security software
    and
  • Condition: Process / Service
  • State: Down
  • Process (examples): threatlockerservice.exe, EPUpdateService.exe
  • Action: Restart the process

Monitor for unintegrated AV / EDR threats detected

  • Condition: Windows Event
  • Example (Sophos)
  • Event Source: Sophos Anti-Virus
  • Event IDs: 6, 16, 32, 42

Monitor for failed user logon attempts

  • Condition: Windows Error
  • Event Source: Microsoft-Windows-Security-Auditing
  • Event IDs: 4625, 4740, 644 (local accounts); 4777 (domain login)
  • Action: Ticket and Investigate

Monitor for the creation, elevation, or removal of users on an endpoint

  • Condition: Windows Error
  • Event Source: Microsoft-Windows-Security-Auditing
  • Event ID: 4720, 4732, 4729
  • Action: Ticket and Investigate 

Identify if the drives on an endpoint are encrypted/unencrypted

  • Condition: Script Result
  • Script (Custom): Check Encryption Status
  • Action: Ticket and Investigate

Monitor backup failures (Ninja Data Protection)

  • Activity: Ninja Data Protection
  • Name: Backup job failed

Monitor backup failures (other backup vendors)

  • Condition: Windows Event
  • Example Source / IDs (Veeam):
    • Event Source: Veeam Agent
    • Event IDs: 190
    • Text Contains: Failed
  • Example Source / IDs (Acronis):
    • Event Source: Online Backup System
    • Event ID: 1
    • Text Contains: Failed

4 Keys to Leveling-up Your Monitoring

  1. Create a baseline device health monitoring template.
  2. Talk to customers about their priorities.
    1. Which servers and workstations are important?
    2. What are their critical line of business or productivity applications?
    3. Where are their IT pain points?
  3. Monitor your PSA / ticketing system for recurring issues.
    1. Adjust alerting to avoid ticket noise.
  4. Monitor clients’ event logs for recurring issues.

Ticketing & Alerting Best Practices 

  1. Only alert on actionable information. If you don’t have a specific response associated with a monitor, don’t monitor it.
  2. Categorize your alerts to go to different service boards in your PSA based on the type or priority.
  3. Host regular alert housekeeping meetings to discuss:
    • Which alerts are causing the most noise? Can they be removed or narrowed in scope?
    • What is not being monitored or creating notifications that should be?
    • Which common alerts can be automatically remediated?
    • Are there any upcoming projects that may generate alerts?
  1. Clean up your tickets and alerts when they are resolved.
    • In NinjaOne, many conditions have a ‘Reset when no longer true’, or ‘Reset when not true for x period’ to help you resolve and clean up notifications that may resolve themselves.

More MSP Monitoring Ideas

See Kelvin Tegelaar’s excellent series on remote monitoring using PowerShell. He covers how to monitor everything from network traffic to Active Directory health to Office 365 failed logins, Shodan results, and more. Best of all, he shares PowerShell scripts that are designed to be RMM agnostic. You can also read our blog post on PowerShell vs CMD Prompt differences and when to use each.

We regularly feature his blog posts along with plenty of additional tools and resources in our weekly MSP Bento newsletter. Subscribe now to get the latest edition along with a special list of the most popular tools and resources we’ve shared.

ready to become an IT Ninja banner

Next Steps

For MSPs, their choice of RMM is critical to their business success. The core promise of an RMM is to deliver automation, efficiency, and scale so the MSP can grow profitably. NinjaOne has been rated the #1 RMM for 3+ years in a row because of our ability to deliver an a fast, easy-to-use, and powerful platform for MSPs of all sizes.

You might also like

Ready to become an IT Ninja?

Learn how NinjaOne can help you simplify IT operations.

×

See NinjaOne in action!

By submitting this form, I accept NinjaOne's privacy policy.

NinjaOne Terms & Conditions

By clicking the “I Accept” button below, you indicate your acceptance of the following legal terms as well as our Terms of Use:

  • Ownership Rights: NinjaOne owns and will continue to own all right, title, and interest in and to the script (including the copyright). NinjaOne is giving you a limited license to use the script in accordance with these legal terms.
  • Use Limitation: You may only use the script for your legitimate personal or internal business purposes, and you may not share the script with another party.
  • Republication Prohibition: Under no circumstances are you permitted to re-publish the script in any script library belonging to or under the control of any other software provider.
  • Warranty Disclaimer: The script is provided “as is” and “as available”, without warranty of any kind. NinjaOne makes no promise or guarantee that the script will be free from defects or that it will meet your specific needs or expectations.
  • Assumption of Risk: Your use of the script is at your own risk. You acknowledge that there are certain inherent risks in using the script, and you understand and assume each of those risks.
  • Waiver and Release: You will not hold NinjaOne responsible for any adverse or unintended consequences resulting from your use of the script, and you waive any legal or equitable rights or remedies you may have against NinjaOne relating to your use of the script.
  • EULA: If you are a NinjaOne customer, your use of the script is subject to the End User License Agreement applicable to you (EULA).