Infrastructure Faults

For version 1.0, Mangle supported the following types of infrastructure faults:

CPU Fault
Memory Fault
Disk IO Fault
Kill Process Fault
Docker State Change Faults
Kubernetes Delete Resource Fault
Kubernetes Resource Not Ready Fault
vCenter Disk Fault
vCenter NIC Fault
vCenter VM State Change Fault

From version 2.0, apart from the faults listed above, support has been extended to the following new faults:

File Handler Leak Fault
Disk Space Fault
Kernel Panic Fault
Network Faults: Packet Delay, Packet Duplication, Packet Loss, Packet Corruption
Kubernetes Service Unavailable Fault
AWS EC2 State Change Fault
AWS EC2 Network Fault

Minor improvements have also been included for Kill Process Fault in version 2 of Mangle.

CPU Fault

CPU fault enables spiking cpu usage values for a selected endpoint by a percentage specified by the user. With the help of a timeout field the duration for the fault run can be specified after which Mangle triggers the automatic remediation procedure.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> CPU.
Select an Endpoint.
Provide a "CPU Load" value. For eg: 80 to simulate a CPU usage of 80% on the selected Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the CPU load of 80% to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Memory Fault

Memory fault enables spiking memory usage values for a selected endpoint by a percentage specified by the user. With the help of a timeout field the duration for the fault run can be specified after which Mangle triggers the automatic remediation procedure.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Memory.
Select an Endpoint.
Provide a "Memory Load" value. For eg: 80 to simulate a Memory usage of 80% on the selected Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the Memory load of 80% to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Disk IO Fault

Disk IO fault enables spiking disk IO operation for a selected endpoint by an IO size specified by the user in bytes. With the help of a timeout field the duration for the fault run can be specified after which Mangle triggers the automatic remediation procedure.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Disk IO.
Select an Endpoint.
Provide a "IO Size" value in bytes. For eg: To write in blocks of 5 KB to the disk of the selected Endpoint specify the IO Size as 5120 (5 KB = 5120 bytes). With the specified block size of 5120 bytes, Mangle will not use more than 5 MB (5 MB = 5120 * 1024 bytes) of disk space during the simulation of fault. The space is cleared at the time of fault remediation.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the IO load of 8192000 to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Kill Process Fault

Kill Process fault enables abrupt termination of any process that is running on the specified endpoint. Unlike other infrastructure faults like CPU, Memory and Disk IO this fault does not have a timeout field because the fault completes very quickly. Some processes/services may be configured for auto-start and some might require a manual start command to be executed. For the first case, auto-remediation through Mangle is not needed. For the second case, you can specify the remediation command that Mangle should use to start the process again. After the fault in completed and if remediation command was accurately specified, a manual remediation can be triggered from the Requests and Reports tab.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Kill Process.
Select an Endpoint.
Provide a "Process Identifier". This can either be a process id or process name. A process name is preferred if the fault is expected to be scheduled.
From version 2.0 onward, Kill Process fault can kill multiple processes with the same process descriptor. This can be done by setting the "Kill All" drop down to true. If set to false, it will fail if the process descriptor is not unique. Alternatively, you can also use the process id to uniquely identify and kill a process.
Provide a "Remediation Command". For eg: To start the sshd process that was killed on an Ubuntu 17 Server, specify the remediation command as "sudo service ssh start" .
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

File Handler Leak Fault

File Handler Leak fault enables you to simulate conditions where a program requests for a handle to a resource but does not release it when the resource is no longer in use. This condition if left over extended periods of time, will lead to "Too many open file handles" errors and will cause performance degradation or crashes. With the help of a timeout field the duration for the fault run can be specified after which Mangle triggers the automatic remediation procedure.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> File Handler Leak.
Select an Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the out of file handles error to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Disk Space Fault

Disk Space Fault enables you to simulate out of disk or low disk space conditions. With the help of a timeout field the duration for the fault run can be specified after which Mangle triggers the automatic remediation procedure.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Disk Space.
Select an Endpoint.
Provide a "Target Directory" so Mangle can target a specific directory location or partition to write to for simulating the low disk space condition.
Provide a "Load" value. For eg: 80 to simulate a Disk usage of 80% of the total disk size or space allocated for a partition, on the selected Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the low disk or out of disk condition to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Kernel Panic Fault

Kernel Panic Fault simulates conditions where the operating system abruptly stops to prevent further damages, security breaches or data corruption and facilitate diagnosis of a sudden hardware or software failure.

REMEDIATION OPTIONS FOR KERNEL PANIC

Remediation for Kernel Panic is controlled by the operating system itself. Typically on Linux systems, Kernel Panic is usually followed by a system reboot. But in some cases due to the settings specified under file /etc/sysctl.d/99-sysctl.conf the automatic system reboot may not occur. For such cases, a manual reboot needs to be triggered on the endpoint to bring it back to a usable state.

To modify this setting as a one-time option, please run the following command on the endpoint sysctl --system

To modify this setting permanently, remotely log in to the endpoint, modify file /etc/sysctl.d/99-sysctl.conf and add the following command

kernel.panic = 20

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Kernel Panic.
Select an Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Network Faults

Network Faults enables you to simulate unfavorable conditions such as packet delay, packet duplication, packet loss and packet corruption. With the help of a timeout field the duration for the fault run can be specified after which Mangle triggers the automatic remediation procedure.

Packet Delay

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Network ---> Packet Delay.
Select an Endpoint.
Provide a "Nic Name". For eg: For a remote machine endpoint Nic name could be eth0, eth1 etc depending on what adapter you would want to target for the fault.
Provide a "Latency" value in milliseconds. For eg: 1000 to simulate a packet delay of 1 second on a particular network interface of an Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the packet delay to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Packet Duplication

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Network ---> Packet Duplicate.
Select an Endpoint.
Provide a "Nic Name". For eg: For a remote machine endpoint Nic name could be eth0, eth1 etc depending on what adapter you would want to target for the fault.
Provide a "Percentage" value to specify what percentage of the packets should be duplicated. For eg: 10 to simulate a packet duplication of 10 percentage on a particular network interface of an Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the packet duplication to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Packet Loss

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Network ---> Packet Loss.
Select an Endpoint.
Provide a "Nic Name". For eg: For a remote machine endpoint Nic name could be eth0, eth1 etc depending on what adapter you would want to target for the fault.
Provide a "Percentage" value to specify what percentage of the packets should be dropped. For eg: 10 to simulate a packet drop of 10 percentage on a particular network interface of an Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the packet drop to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Packet Corruption

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Network ---> Packet Corruption.
Select an Endpoint.
Provide a "Nic Name". For eg: For a remote machine endpoint Nic name could be eth0, eth1 etc depending on what adapter you would want to target for the fault.
Provide a "Percentage" value to specify what percentage of the packets should be corrupted. For eg: 10 to simulate a packet corruption of 10 percentage on a particular network interface of an Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the packet corruption to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Docker State Change

Docker State Change faults enable you to abruptly stop or pause containers running on a Docker host. Unlike other infrastructure faults like CPU, Memory and Disk IO this fault is specific to the Docker endpoint and does not have a timeout field because the fault completes very quickly. Some containers may be configured for auto-start and some might require a manual start command to be executed. For the first case, auto-remediation through Mangle is not needed. For the second case, a manual remediation can be triggered from the Requests and Reports tab after the fault completes.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Docker ---> State Change.
Select an Endpoint (Only Docker Endpoints are listed).
Select the fault.
Provide a "Container Name".
Schedule options are not available for this fault.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Kubernetes Delete Resource

Kubernetes (K8s) Delete Resource faults enable you to abruptly delete pods or nodes within a K8s cluster. Unlike other infrastructure faults like CPU, Memory and Disk IO this fault is specific to the K8s endpoint and does not have a timeout field because the fault completes very quickly. In most cases, K8s will automatically replace the deleted resource. This fault allows you see how the applications hosted on these pods behave in the event of rescheduling when a K8s resource is deleted and re-created.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> K8S ---> Delete Resource.
Select an Endpoint (Only K8S endpoints are listed).
Select a Resource Type: POD or NODE.
Select a Resource identifier: Resource Name or Resource Labels.
If you choose Resource Name to identify a pod or a node, enter a string.
If you choose Resource Labels provide a key value pair for eg: app=mangle. Since multiple resources can have the same label, you also need to specify if you are interested in a Random Injection. If "Random Injection" is set to true, Mangle will randomly choose one resource in a list of resources identified using the label, for introducing the fault. If "Random Injection" is set to false, it will introduce fault into all resources identified using the resource label.
Schedule options are not available for this fault.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". Remediation requests are not supported for this fault.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Kubernetes Resource Not Ready

Kubernetes (K8s) Resource Not Ready faults enable you to abruptly put pods or nodes within a K8s cluster into a state that is not suitable for scheduling.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> K8S ---> Delete Resource.
Select an Endpoint (Only K8S endpoints are listed).
Select a Resource Type: POD or NODE.
Select a Resource identifier: Resource Name or Resource Labels.
If you choose Resource Name to identify a pod or a node, enter a string.
If you choose Resource Labels provide a key value pair for eg: app=mangle. Since multiple resources can have the same label, you also need to specify if you are interested in a Random Injection. If "Random Injection" is set to true, Mangle will randomly choose one resource in a list of resources identified using the label, for introducing the fault. If "Random Injection" is set to false, it will introduce fault into all resources identified using the resource label.
Provide an app container name. Please note that the application specified should have a readiness probe configured for this fault to be triggered successfully.
Schedule options are not available for this fault.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". Remediation requests are not supported for this fault.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Kubernetes Service Not Available

Kubernetes (K8s) Service Not Available faults enable you to abruptly make a service resource in K8s cluster not available, although the pod will be healthy and running.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> K8S ---> Service Unavailable.
Select an Endpoint (Only K8S endpoints are listed).
Choose the appropriate service identifier: Service Name or Service Labels.
If you choose Service Name, enter an appropriate string.
If you choose Service Labels provide a key value pair for eg: app=mangle. Since multiple resources can have the same label, you also need to specify if you are interested in a Random Injection. If "Random Injection" is set to true, Mangle will randomly choose one resource in a list of resources identified using the label, for introducing the fault. If "Random Injection" is set to false, it will introduce fault into all resources identified using the resource label.
Schedule options are not available for this fault.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". Remediation requests are not supported for this fault.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

vCenter Disk Fault

vCenter Disk faults enable you to abruptly disconnect disks from a virtual machine in its inventory. It requires the VM Disk ID and VM Name to trigger the fault. For all vCenter faults, Mangle talks to the mangle-vc-adapter to connect and perform the required action on VC. So it is mandatory that you install the mangle-vc-adapter container prior to adding vCenter Endpoints or running vCenter faults. To find how to install and configure the mangle-vc-adapter, please refer here.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> vCenter ---> Disk.
Select an Endpoint (Only vCenter endpoints are listed).
Select the fault: Disconnect Disk.
Provide the VM Name and VM Disk ID. To identify the disk id, the VM moid is required. This information can be gathered from the vCenter MOB (Managed Object Browser). Refer to Looking up Managed Object Reference for vCenter for help on this. Once you have retrieved the VM moid, the disk id can be retrieved from the disk properties section in the link below after replacing the values in angle braces <>:
https:///mob/?moid=&doPath=layout
Schedule options are not available for this fault.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

vCenter NIC Fault

vCenter NIC faults enable you to abruptly disconnect network interface cards from a virtual machine in its inventory. It requires the VM Nic ID and VM Name to trigger the fault. For all vCenter faults, Mangle talks to the mangle-vc-adapter to connect and perform the required action on VC. So it is mandatory that you install the mangle-vc-adapter container prior to adding vCenter Endpoints or running vCenter faults. To find how to install and configure the mangle-vc-adapter, please refer here.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> vCenter ---> Nic.
Select an Endpoint (Only vCenter endpoints are listed).
Select the fault: Disconnect Nic.
Provide the VM Name and VM Nic ID. To identify the Nic id, the VM moid is required. This information can be gathered from the vCenter MOB (Managed Object Browser). Refer to Looking up Managed Object Reference for vCenter for help on this. Once you have retrieved the VM moid, the disk id can be retrieved from the deviceConfigId section in the link below after replacing the values in angle braces <>:
https:///mob/?moid=&doPath=guest%2enet
Schedule options are not available for this fault.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

vCenter VM State Change Fault

vCenter VM State Change faults enable you to abruptly power-off, reset or suspend any virtual machine in its inventory. It requires just the VM Name to trigger the fault. For all vCenter faults, Mangle talks to the mangle-vc-adapter to connect and perform the required action on VC. So it is mandatory that you install the mangle-vc-adapter container prior to adding vCenter Endpoints or running vCenter faults. To find how to install and configure the mangle-vc-adapter, please refer here.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> vCenter ---> State.
Select an Endpoint (Only vCenter endpoints are listed).
Select one of the faults: Poweroff, Suspend or Reset VM.
Provide the VM Name.
Schedule options are not available for this fault.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

AWS EC2 State Change Fault

AWS EC2 State Change fault enables you to abruptly terminate, stop or reboot any EC2 instance. It requires AWS tags to uniquely identify instances against which the fault should run.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> AWS ---> EC2 ---> State.
Select an Endpoint (Only AWS accounts are listed).
Select one of the faults: Terminate_Instances, Stop_Instances, Reboot_Instances.
Provide the AWS tag (key value pair to uniquely identify the instance(s). Since multiple instances can have the same tag, you also need to specify if you are interested in a Random Injection. If "Random Injection" is set to true, Mangle will randomly choose one instance from a list of instances identified using the tag, for introducing the fault. If "Random Injection" is set to false, it will introduce the fault into all the instances identified using the tag.
Schedule options are not available for this fault.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

AWS EC2 Network Fault

AWS EC2 Network fault enable you to abruptly terminate, stop or reboot any EC2 instance. It requires AWS tags to uniquely identify instances against which the fault should run.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> AWS ---> EC2 ---> Network.
Select an Endpoint (Only AWS accounts are listed).
Select the faults: Block_All_Network_Traffic.
Provide the AWS tag (key value pair to uniquely identify the instance(s). Since multiple instances can have the same tag, you also need to specify if you are interested in a Random Injection. If "Random Injection" is set to true, Mangle will randomly choose one instance from a list of instances identified using the tag, for introducing the fault. If "Random Injection" is set to false, it will introduce the fault into all the instances identified using the tag.
Schedule options are not available for this fault.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Relevant API Reference

For access to relevant API Swagger documentation:

Please traverse to link -----> API Documentation from the Mangle UI or access https:///mangle-services/swagger-ui.html#/fault-injection-controller

PrécédentInjecting Faults SuivantApplication Faults

Mis à jour il y a 4 ans

Ce contenu vous a-t-il été utile ?