Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. Moein monitoring system is capable of monitoring Spark in both mode of operation. Performance metrics of master and workers, executors in workers, JVM metrics such as heap and garbage collectors and RDD are collected. The following is a list of the performance metrics of Apache Spark:
General Information:
Master Node Information:
- Master Number Of Used Cores
- Number Of Active Applications
- Number Of Completed Applications
- Master Used Memory Percentage
- Master Core Used Percentage
- Number Of Completed Drivers
Workers Metrics:
- Worker Number Of Used Cores
- Worker Number Of Free Cores
- Elapsed Time Since Last Heartbeat
- Worker Used Memory Percentage
- Worker Core Used Percentage
Applications Metrics:
- Application Number Of Allocated Cores
- Application Running Duration
- Application Running Status
Worker Metrics:
- Master Web Service Address
- Worker Number Of Used Cores
- Worker Used Memory Percentage
- Worker Core Used Percentage
- Total Number Of Running Executors
- Total Number Of Finished Executors
Executors in Workers Metrics:
- Executor Application Name
- Number Of Executor Application Cores
- Executor Application User
- Executor Application Memory Per Slave
Heap and Non Heap Memory:
- Committed Non-Heap Memory
- Heap Memory Used Percentage
- Non-Heap Memory Used Percentage
Memory Pools KPIs:
- Memory Pool Committed Memory
- Memory Pool Initial Memory
- Memory Pool Maximum Memory
- Memory Pool Used Percentage
GC Metrics:
- Average Garbage Collection Time
RDD Metrics:
- Parallel Listing Job Count
- Total Number Of Compilation
Communication Protocols: