1 sur 7

sre-developers-and-users

Adding Endpoints

Supported Endpoints

Endpoint in Mangle refers to an infrastructure component that will be the primary target for your chaos engineering experiments.

For version 1.0, Mangle supported four types of endpoints:

Kubernetes
Docker
VMware vCenter
Remote Machine

From version 2.0, apart from the four endpoints listed above, support has been extended to the following new endpoint:

AWS (Amazon Web Services)

Kubernetes Endpoint

Mangle supports K8s clusters as endpoints or targets for injection. It needs a kubeconfig file to connect to the cluster and run the supported faults. If a kubeconfig file is not provided, Mangle assumes that it is running on a K8s cluster and targets the same cluster for fault injection.

Steps to follow:

Login as an user with read and write privileges to Mangle.
Navigate to Endpoint tab ---> Kubernetes Cluster.
Click on .
Enter a name, credential (kubeconfig file), namespace (mandatory...Please specify "default" if you are unsure of the namespace else provide the actual name), tags (refers to additional tags that should be send to the enabled metric provider to uniquely identify faults against that endpoint) and click on Test Connection.
If Test Connection succeeds click on Submit.
A success message is displayed and the table for Endpoints will be updated with the new entry.
Click on against a table entry to see the supported operations.

Docker Endpoint

Mangle supports docker hosts as endpoints or targets for injection. It needs the IP/Hostname, port details and certificate details (if TLS is enabled for the docker host with --tlsverify option specified) to connect to the docker host and run the supported faults.

Steps to follow:

Login as an user with read and write privileges to Mangle.
Navigate to Endpoint tab.
Click on .
Enter a name, IP/Hostname, port details, tags (refers to additional tags that should be send to the enabled metric provider to uniquely identify faults against that endpoint), certificate details (if TLS is enabled for the docker host)and click on Test Connection.
If Test Connection succeeds click on Submit.
A success message is displayed and the table for Endpoints will be updated with the new entry.
Click on against a table entry to see the supported operations.

VMware vCenter Endpoint

Mangle supports VMware vCenter as endpoints or targets for injection. It needs the IP/Hostname, credentials and a vCenter adapter URL to connect to the vCenter instance and run the supported faults.

Steps to follow:

Login as an user with read and write privileges to Mangle.
Navigate to Endpoint tab.
Click on .
Enter a name, IP/Hostname, credentials, vCenter Adapter URL (format: "https://:"where the IP/hostname is the docker host where the adapter container runs appended with the port used), username, password, tags (refers to additional tags that should be send to the enabled metric provider to uniquely identify faults against that endpoint) and click on Test Connection.
If Test Connection succeeds click on Submit.
A success message is displayed and the table for Endpoints will be updated with the new entry.
Click on against a table entry to see the supported operations.

When the vCenter adapter is deployed on the same machine on which Mangle is running, vCenter adapter IP used for adding vCenter endpoint can be either

A internal docker container IP OR
A docker host IP

To find out the internal docker container IP for mangle-vc-adapter run

docker inspect --format '{{.NetworkSettings.IPAddress}}' *mangle-vc-adapter

Remote Machine Endpoint

Mangle supports any remote machine with ssh enabled as endpoints or targets for injection. It needs the IP/Hostname, credentials (either password or private key), ssh details, OS type and tags to connect to the remote machine and run the supported faults.

Steps to follow:

Login as an user with read and write privileges to Mangle.
Navigate to Endpoint tab.
Click on .
Enter a name, IP/Hostname, credentials (either password or private key), ssh port, ssh timeout, OS type, tags (refers to additional tags that should be send to the enabled metric provider to uniquely identify faults against that endpoint) and click on Test Connection.
If Test Connection succeeds click on Submit.
A success message is displayed and the table for Endpoints will be updated with the new entry.
Click on against a table entry to see the supported operations.

AWS (Amazon Web Services)

Mangle supports AWS as endpoint or target for injection. It needs the Region, credentials (Access key ID and Secret key) and tags to connect to AWS and run the supported faults. Currently the only supported service is EC2. However, there are plans to extend this to other important services in AWS.

Steps to follow:

Login as an user with read and write privileges to Mangle.
Navigate to Endpoint tab.
Click on .
Enter a name, Region, credentials (Access key ID and Secret key), tags (refers to additional tags that should be send to the enabled metric provider to uniquely identify faults against that endpoint) and click on Test Connection.
If Test Connection succeeds click on Submit.
A success message is displayed and the table for Endpoints will be updated with the new entry.
Click on against a table entry to see the supported operations.

Relevant API Reference

For access to Swagger documentation:

Please traverse to link -----> API Documentation from the Mangle UI or access https:///mangle-services/swagger-ui.html#/endpoint-controller

Injecting Faults

Mangle supports two broad category of faults:

Infrastructure Faults
Application Faults

Infrastructure Faults are a set of faults that target IAAS components where developers host and run their applications. For eg: this might be a virtual machine or an AWS EC2 instance where the application runs as a service or a Docker host where the application containers are hosted or a K8s cluster where the pods host the application. These components are usually shared with multiple applications running on the same infrastructure and are referred to as endpoints in Mangle. So faults against these components will impact multiple applications unless they have different levels of fault tolerance.

Application Faults are a set of faults that target specific applications running within a given infrastructure component or endpoint. For eg: this could be a specific tomcat application running within a virtual machine or an AWS EC2 instance or JAVA applications running within containers on a Docker host or K8s pods. Faults against applications typically will impact just that application and ideally should not bring down any other applications running on the same infrastructure or is dependent on the affected service. If it does, your system is prone to cascading failures and should be examined in great detail to improve fault tolerance levels.

Infrastructure Faults

For version 1.0, Mangle supported the following types of infrastructure faults:

CPU Fault
Memory Fault
Disk IO Fault
Kill Process Fault
Docker State Change Faults
Kubernetes Delete Resource Fault
Kubernetes Resource Not Ready Fault
vCenter Disk Fault
vCenter NIC Fault
vCenter VM State Change Fault

From version 2.0, apart from the faults listed above, support has been extended to the following new faults:

File Handler Leak Fault
Disk Space Fault
Kernel Panic Fault
Network Faults: Packet Delay, Packet Duplication, Packet Loss, Packet Corruption
Kubernetes Service Unavailable Fault
AWS EC2 State Change Fault
AWS EC2 Network Fault

Minor improvements have also been included for Kill Process Fault in version 2 of Mangle.

CPU Fault

CPU fault enables spiking cpu usage values for a selected endpoint by a percentage specified by the user. With the help of a timeout field the duration for the fault run can be specified after which Mangle triggers the automatic remediation procedure.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> CPU.
Select an Endpoint.
Provide a "CPU Load" value. For eg: 80 to simulate a CPU usage of 80% on the selected Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the CPU load of 80% to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Memory Fault

Memory fault enables spiking memory usage values for a selected endpoint by a percentage specified by the user. With the help of a timeout field the duration for the fault run can be specified after which Mangle triggers the automatic remediation procedure.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Memory.
Select an Endpoint.
Provide a "Memory Load" value. For eg: 80 to simulate a Memory usage of 80% on the selected Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the Memory load of 80% to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Disk IO Fault

Disk IO fault enables spiking disk IO operation for a selected endpoint by an IO size specified by the user in bytes. With the help of a timeout field the duration for the fault run can be specified after which Mangle triggers the automatic remediation procedure.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Disk IO.
Select an Endpoint.
Provide a "IO Size" value in bytes. For eg: To write in blocks of 5 KB to the disk of the selected Endpoint specify the IO Size as 5120 (5 KB = 5120 bytes). With the specified block size of 5120 bytes, Mangle will not use more than 5 MB (5 MB = 5120 * 1024 bytes) of disk space during the simulation of fault. The space is cleared at the time of fault remediation.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the IO load of 8192000 to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Kill Process Fault

Kill Process fault enables abrupt termination of any process that is running on the specified endpoint. Unlike other infrastructure faults like CPU, Memory and Disk IO this fault does not have a timeout field because the fault completes very quickly. Some processes/services may be configured for auto-start and some might require a manual start command to be executed. For the first case, auto-remediation through Mangle is not needed. For the second case, you can specify the remediation command that Mangle should use to start the process again. After the fault in completed and if remediation command was accurately specified, a manual remediation can be triggered from the Requests and Reports tab.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Kill Process.
Select an Endpoint.
Provide a "Process Identifier". This can either be a process id or process name. A process name is preferred if the fault is expected to be scheduled.
From version 2.0 onward, Kill Process fault can kill multiple processes with the same process descriptor. This can be done by setting the "Kill All" drop down to true. If set to false, it will fail if the process descriptor is not unique. Alternatively, you can also use the process id to uniquely identify and kill a process.
Provide a "Remediation Command". For eg: To start the sshd process that was killed on an Ubuntu 17 Server, specify the remediation command as "sudo service ssh start" .
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

File Handler Leak Fault

File Handler Leak fault enables you to simulate conditions where a program requests for a handle to a resource but does not release it when the resource is no longer in use. This condition if left over extended periods of time, will lead to "Too many open file handles" errors and will cause performance degradation or crashes. With the help of a timeout field the duration for the fault run can be specified after which Mangle triggers the automatic remediation procedure.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> File Handler Leak.
Select an Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the out of file handles error to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Disk Space Fault

Disk Space Fault enables you to simulate out of disk or low disk space conditions. With the help of a timeout field the duration for the fault run can be specified after which Mangle triggers the automatic remediation procedure.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Disk Space.
Select an Endpoint.
Provide a "Target Directory" so Mangle can target a specific directory location or partition to write to for simulating the low disk space condition.
Provide a "Load" value. For eg: 80 to simulate a Disk usage of 80% of the total disk size or space allocated for a partition, on the selected Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the low disk or out of disk condition to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Kernel Panic Fault

Kernel Panic Fault simulates conditions where the operating system abruptly stops to prevent further damages, security breaches or data corruption and facilitate diagnosis of a sudden hardware or software failure.

REMEDIATION OPTIONS FOR KERNEL PANIC

Remediation for Kernel Panic is controlled by the operating system itself. Typically on Linux systems, Kernel Panic is usually followed by a system reboot. But in some cases due to the settings specified under file /etc/sysctl.d/99-sysctl.conf the automatic system reboot may not occur. For such cases, a manual reboot needs to be triggered on the endpoint to bring it back to a usable state.

To modify this setting as a one-time option, please run the following command on the endpoint sysctl --system

To modify this setting permanently, remotely log in to the endpoint, modify file /etc/sysctl.d/99-sysctl.conf and add the following command

kernel.panic = 20

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Kernel Panic.
Select an Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Network Faults

Network Faults enables you to simulate unfavorable conditions such as packet delay, packet duplication, packet loss and packet corruption. With the help of a timeout field the duration for the fault run can be specified after which Mangle triggers the automatic remediation procedure.

Packet Delay

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Network ---> Packet Delay.
Select an Endpoint.
Provide a "Nic Name". For eg: For a remote machine endpoint Nic name could be eth0, eth1 etc depending on what adapter you would want to target for the fault.
Provide a "Latency" value in milliseconds. For eg: 1000 to simulate a packet delay of 1 second on a particular network interface of an Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the packet delay to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Packet Duplication

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Network ---> Packet Duplicate.
Select an Endpoint.
Provide a "Nic Name". For eg: For a remote machine endpoint Nic name could be eth0, eth1 etc depending on what adapter you would want to target for the fault.
Provide a "Percentage" value to specify what percentage of the packets should be duplicated. For eg: 10 to simulate a packet duplication of 10 percentage on a particular network interface of an Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the packet duplication to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Packet Loss

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Network ---> Packet Loss.
Select an Endpoint.
Provide a "Nic Name". For eg: For a remote machine endpoint Nic name could be eth0, eth1 etc depending on what adapter you would want to target for the fault.
Provide a "Percentage" value to specify what percentage of the packets should be dropped. For eg: 10 to simulate a packet drop of 10 percentage on a particular network interface of an Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the packet drop to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Packet Corruption

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Network ---> Packet Corruption.
Select an Endpoint.
Provide a "Nic Name". For eg: For a remote machine endpoint Nic name could be eth0, eth1 etc depending on what adapter you would want to target for the fault.
Provide a "Percentage" value to specify what percentage of the packets should be corrupted. For eg: 10 to simulate a packet corruption of 10 percentage on a particular network interface of an Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the packet corruption to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Docker State Change

Docker State Change faults enable you to abruptly stop or pause containers running on a Docker host. Unlike other infrastructure faults like CPU, Memory and Disk IO this fault is specific to the Docker endpoint and does not have a timeout field because the fault completes very quickly. Some containers may be configured for auto-start and some might require a manual start command to be executed. For the first case, auto-remediation through Mangle is not needed. For the second case, a manual remediation can be triggered from the Requests and Reports tab after the fault completes.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> Docker ---> State Change.
Select an Endpoint (Only Docker Endpoints are listed).
Select the fault.
Provide a "Container Name".
Schedule options are not available for this fault.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Kubernetes Delete Resource

Kubernetes (K8s) Delete Resource faults enable you to abruptly delete pods or nodes within a K8s cluster. Unlike other infrastructure faults like CPU, Memory and Disk IO this fault is specific to the K8s endpoint and does not have a timeout field because the fault completes very quickly. In most cases, K8s will automatically replace the deleted resource. This fault allows you see how the applications hosted on these pods behave in the event of rescheduling when a K8s resource is deleted and re-created.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> K8S ---> Delete Resource.
Select an Endpoint (Only K8S endpoints are listed).
Select a Resource Type: POD or NODE.
Select a Resource identifier: Resource Name or Resource Labels.
If you choose Resource Name to identify a pod or a node, enter a string.
If you choose Resource Labels provide a key value pair for eg: app=mangle. Since multiple resources can have the same label, you also need to specify if you are interested in a Random Injection. If "Random Injection" is set to true, Mangle will randomly choose one resource in a list of resources identified using the label, for introducing the fault. If "Random Injection" is set to false, it will introduce fault into all resources identified using the resource label.
Schedule options are not available for this fault.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". Remediation requests are not supported for this fault.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Kubernetes Resource Not Ready

Kubernetes (K8s) Resource Not Ready faults enable you to abruptly put pods or nodes within a K8s cluster into a state that is not suitable for scheduling.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> K8S ---> Delete Resource.
Select an Endpoint (Only K8S endpoints are listed).
Select a Resource Type: POD or NODE.
Select a Resource identifier: Resource Name or Resource Labels.
If you choose Resource Name to identify a pod or a node, enter a string.
If you choose Resource Labels provide a key value pair for eg: app=mangle. Since multiple resources can have the same label, you also need to specify if you are interested in a Random Injection. If "Random Injection" is set to true, Mangle will randomly choose one resource in a list of resources identified using the label, for introducing the fault. If "Random Injection" is set to false, it will introduce fault into all resources identified using the resource label.
Provide an app container name. Please note that the application specified should have a readiness probe configured for this fault to be triggered successfully.
Schedule options are not available for this fault.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". Remediation requests are not supported for this fault.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Kubernetes Service Not Available

Kubernetes (K8s) Service Not Available faults enable you to abruptly make a service resource in K8s cluster not available, although the pod will be healthy and running.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> K8S ---> Service Unavailable.
Select an Endpoint (Only K8S endpoints are listed).
Choose the appropriate service identifier: Service Name or Service Labels.
If you choose Service Name, enter an appropriate string.
If you choose Service Labels provide a key value pair for eg: app=mangle. Since multiple resources can have the same label, you also need to specify if you are interested in a Random Injection. If "Random Injection" is set to true, Mangle will randomly choose one resource in a list of resources identified using the label, for introducing the fault. If "Random Injection" is set to false, it will introduce fault into all resources identified using the resource label.
Schedule options are not available for this fault.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". Remediation requests are not supported for this fault.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

vCenter Disk Fault

vCenter Disk faults enable you to abruptly disconnect disks from a virtual machine in its inventory. It requires the VM Disk ID and VM Name to trigger the fault. For all vCenter faults, Mangle talks to the mangle-vc-adapter to connect and perform the required action on VC. So it is mandatory that you install the mangle-vc-adapter container prior to adding vCenter Endpoints or running vCenter faults. To find how to install and configure the mangle-vc-adapter, please refer here.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> vCenter ---> Disk.
Select an Endpoint (Only vCenter endpoints are listed).
Select the fault: Disconnect Disk.
Provide the VM Name and VM Disk ID. To identify the disk id, the VM moid is required. This information can be gathered from the vCenter MOB (Managed Object Browser). Refer to Looking up Managed Object Reference for vCenter for help on this. Once you have retrieved the VM moid, the disk id can be retrieved from the disk properties section in the link below after replacing the values in angle braces <>:
https:///mob/?moid=&doPath=layout
Schedule options are not available for this fault.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

vCenter NIC Fault

vCenter NIC faults enable you to abruptly disconnect network interface cards from a virtual machine in its inventory. It requires the VM Nic ID and VM Name to trigger the fault. For all vCenter faults, Mangle talks to the mangle-vc-adapter to connect and perform the required action on VC. So it is mandatory that you install the mangle-vc-adapter container prior to adding vCenter Endpoints or running vCenter faults. To find how to install and configure the mangle-vc-adapter, please refer here.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> vCenter ---> Nic.
Select an Endpoint (Only vCenter endpoints are listed).
Select the fault: Disconnect Nic.
Provide the VM Name and VM Nic ID. To identify the Nic id, the VM moid is required. This information can be gathered from the vCenter MOB (Managed Object Browser). Refer to Looking up Managed Object Reference for vCenter for help on this. Once you have retrieved the VM moid, the disk id can be retrieved from the deviceConfigId section in the link below after replacing the values in angle braces <>:
https:///mob/?moid=&doPath=guest%2enet
Schedule options are not available for this fault.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

vCenter VM State Change Fault

vCenter VM State Change faults enable you to abruptly power-off, reset or suspend any virtual machine in its inventory. It requires just the VM Name to trigger the fault. For all vCenter faults, Mangle talks to the mangle-vc-adapter to connect and perform the required action on VC. So it is mandatory that you install the mangle-vc-adapter container prior to adding vCenter Endpoints or running vCenter faults. To find how to install and configure the mangle-vc-adapter, please refer here.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> vCenter ---> State.
Select an Endpoint (Only vCenter endpoints are listed).
Select one of the faults: Poweroff, Suspend or Reset VM.
Provide the VM Name.
Schedule options are not available for this fault.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

AWS EC2 State Change Fault

AWS EC2 State Change fault enables you to abruptly terminate, stop or reboot any EC2 instance. It requires AWS tags to uniquely identify instances against which the fault should run.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> AWS ---> EC2 ---> State.
Select an Endpoint (Only AWS accounts are listed).
Select one of the faults: Terminate_Instances, Stop_Instances, Reboot_Instances.
Provide the AWS tag (key value pair to uniquely identify the instance(s). Since multiple instances can have the same tag, you also need to specify if you are interested in a Random Injection. If "Random Injection" is set to true, Mangle will randomly choose one instance from a list of instances identified using the tag, for introducing the fault. If "Random Injection" is set to false, it will introduce the fault into all the instances identified using the tag.
Schedule options are not available for this fault.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

AWS EC2 Network Fault

AWS EC2 Network fault enable you to abruptly terminate, stop or reboot any EC2 instance. It requires AWS tags to uniquely identify instances against which the fault should run.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Infrastructure Faults ---> AWS ---> EC2 ---> Network.
Select an Endpoint (Only AWS accounts are listed).
Select the faults: Block_All_Network_Traffic.
Provide the AWS tag (key value pair to uniquely identify the instance(s). Since multiple instances can have the same tag, you also need to specify if you are interested in a Random Injection. If "Random Injection" is set to true, Mangle will randomly choose one instance from a list of instances identified using the tag, for introducing the fault. If "Random Injection" is set to false, it will introduce the fault into all the instances identified using the tag.
Schedule options are not available for this fault.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Relevant API Reference

For access to relevant API Swagger documentation:

Please traverse to link -----> API Documentation from the Mangle UI or access https:///mangle-services/swagger-ui.html#/fault-injection-controller

Application Faults

For version 1.0, Mangle supported the following types of application faults:

CPU Fault
Memory Fault

From version 2.0, apart from the faults listed above, support has been extended to the following new faults:

File Handler Leak Fault
Thread Leak Fault
Java Method Latency Fault
Spring Service Latency Fault
Spring Service Exception Fault
Simulate Java Exception
Kill JVM Fault

CPU Fault

CPU fault enables spiking cpu usage values for a selected application within a specified endpoint by a percentage specified by the user. Mangle uses a modified Byteman agent to simulate this fault and supports only Java based applications at present. With the help of a timeout field the duration for the fault run can be specified after which Mangle triggers the automatic remediation procedure which includes a cleanup of the Byteman agent from the target endpoint.

This fault therefore takes additional arguments to identify the application under test.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Application Faults ---> CPU.
Select an Endpoint.
If the Endpoint is of type Kubernetes:
Provide additional K8s arguments such as Container Name, Pod Labels and the Random Injection flag.
If the Endpoint is of type Docker:
Provide additional Docker argument such as Container Name.
Provide a "CPU Load" value. For eg: 80 to simulate a CPU usage of 80% on the selected application.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the CPU load of 80% to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Provide additional JVM properties such as Java Home, JVM Process, Free Port and Logon User. For eg: If the application under test is a VMware application then the JRE for the application resides in a specific location so for Java Home enter string /usr/java/jre-vmware/bin/java. The JVM Process can be either the process id of the application or the JVM descriptor name. In cases where you schedule, application faults, it is preferable to specify the JVM descriptor name. The Free Port is for the Byteman agent to talk to the application, so provide one that is not in use. The Logon User should be a user who has permissions to access and run the application under test. If it is root specify that else specify the appropriate user id.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Memory Fault

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Application Faults ---> Memory.
Select an Endpoint.
If the Endpoint is of type Kubernetes:
Provide additional K8s arguments such as Container Name, Pod Labels and the Random Injection flag.
If the Endpoint is of type Docker:
Provide additional Docker argument such as Container Name.
Provide a "Memory Load" value. For eg: 80 to simulate a Memory usage of 80% on the selected Endpoint.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the Memory load of 80% to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Provide additional JVM properties such as Java Home, JVM Process, Free Port and Logon User. For eg: If the application under test is a VMware application then the JRE for the application resides in a specific location so for Java Home enter string /usr/java/jre-vmware/bin/java. The JVM Process can be either the process id of the application or the JVM descriptor name. In cases where you schedule, application faults, it is preferable to specify the JVM descriptor name. The Free Port is for the Byteman agent to talk to the application, so provide one that is not in use. The Logon User should be a user who has permissions to access and run the application under test. If it is root specify that else specify the appropriate user id.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

File Handler Leak Fault

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Application Faults ---> Memory.
Select an Endpoint.
If the Endpoint is of type Kubernetes:
Provide additional K8s arguments such as Container Name, Pod Labels and the Random Injection flag.
If the Endpoint is of type Docker:
Provide additional Docker argument such as Container Name.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the File Handler leak to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Provide additional JVM properties such as Java Home, JVM Process, Free Port and Logon User. For eg: If the application under test is a VMware application then the JRE for the application resides in a specific location so for Java Home enter string /usr/java/jre-vmware/bin/java. The JVM Process can be either the process id of the application or the JVM descriptor name. In cases where you schedule, application faults, it is preferable to specify the JVM descriptor name. The Free Port is for the Byteman agent to talk to the application, so provide one that is not in use. The Logon User should be a user who has permissions to access and run the application under test. If it is root specify that else specify the appropriate user id.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Thread Leak Fault

Thread Leak fault enables you to simulate conditions where an open thread is not closed. This condition if left over extended periods of time, leads to too many open threads thus creating thread leaks and out of memory issues. Usually a thread dump is required to troubleshoot such issues. With the help of a timeout field the duration for the fault run can be specified after which Mangle triggers the automatic remediation procedure.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Application Faults ---> Thread Leak.
Select an Endpoint.
If the Endpoint is of type Kubernetes:
Provide additional K8s arguments such as Container Name, Pod Labels and the Random Injection flag.
If the Endpoint is of type Docker:
Provide additional Docker argument such as Container Name.
Set of Out of Memory required flag to true if you want the thread leak to eventually result in OOM errors.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide a "Timeout" value in milliseconds. For eg: if you need the Thread leak to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention.
Provide additional JVM properties such as Java Home, JVM Process, Free Port and Logon User. For eg: If the application under test is a VMware application then the JRE for the application resides in a specific location so for Java Home enter string /usr/java/jre-vmware/bin/java. The JVM Process can be either the process id of the application or the JVM descriptor name. In cases where you schedule, application faults, it is preferable to specify the JVM descriptor name. The Free Port is for the Byteman agent to talk to the application, so provide one that is not in use. The Logon User should be a user who has permissions to access and run the application under test. If it is root specify that else specify the appropriate user id.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Java Method Latency Fault

Java Method Latency Fault helps you simulate a condition where calls to a specific Java method can be delayed by a specific time. Please note that you would have to be familiar with the application code; Java classes and methods in order to simulate this fault.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Application Faults ---> Java Method Latency.
Select an Endpoint.
If the Endpoint is of type Kubernetes:
Provide additional K8s arguments such as Container Name, Pod Labels and the Random Injection flag.
If the Endpoint is of type Docker:
Provide additional Docker argument such as Container Name.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide "Latency" value in milliseconds so that Mangle can delay calls to the method by that time.
Provide "Class Name" as PluginController if the class of interest is defined as public class PluginController {...}.
Provide "Method Name" as getPlugins if the method to be tested is defined as follows:
public ResponseEntity> getPlugins(
@RequestParam(value = "pluginId", required = false) String pluginId, @RequestParam(value = "extensionType", required = false) ExtensionType extensionType) {
log.info("PluginController getPlugins() Start.............");
if (StringUtils.hasLength(pluginId)) {
return new ResponseEntity<>(pluginService.getExtensions(pluginId, extensionType), HttpStatus.OK);
}
return new ResponseEntity<>(pluginService.getExtensions(), HttpStatus.OK);
}
Provide "Rule Event" as "AT ENTRY" OR "AT EXIT" to specify if the fault has to be introduced in the beginning or at the end of the method call.
Provide additional JVM properties such as Java Home, JVM Process, Free Port and Logon User. For eg: If the application under test is a VMware application then the JRE for the application resides in a specific location so for Java Home enter string /usr/java/jre-vmware/bin/java. The JVM Process can be either the process id of the application or the JVM descriptor name. In cases where you schedule, application faults, it is preferable to specify the JVM descriptor name. The Free Port is for the Byteman agent to talk to the application, so provide one that is not in use. The Logon User should be a user who has permissions to access and run the application under test. If it is root specify that else specify the appropriate user id.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Spring Service Latency Fault

Spring Service Latency Fault helps you simulate a condition where calls to a specific API can be delayed by a specific time. Please note that you would have to be familiar with the REST application URLs and calls in order to simulate this fault.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Application Faults ---> Spring Service Latency.
Select an Endpoint.
If the Endpoint is of type Kubernetes:
Provide additional K8s arguments such as Container Name, Pod Labels and the Random Injection flag.
If the Endpoint is of type Docker:
Provide additional Docker argument such as Container Name.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide "Latency" value in milliseconds so that Mangle can delay calls to the method by that time.
Provide "Service URI" as /rest/api/v1/plugin if the REST URL of interest is as follows https://xxx.vmware.com/mangle-services/rest/api/v1/plugins.
Provide "Http Method" as GET, POST, PUT, PATCH or DELETE as applicable.
Provide additional JVM properties such as Java Home, JVM Process, Free Port and Logon User. For eg: If the application under test is a VMware application then the JRE for the application resides in a specific location so for Java Home enter string /usr/java/jre-vmware/bin/java. The JVM Process can be either the process id of the application or the JVM descriptor name. In cases where you schedule, application faults, it is preferable to specify the JVM descriptor name. The Free Port is for the Byteman agent to talk to the application, so provide one that is not in use. The Logon User should be a user who has permissions to access and run the application under test. If it is root specify that else specify the appropriate user id.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Spring Service Exception Fault

Spring Service Exception Fault helps you simulate a condition where calls to a specific API can be simulated to throw an exception. Please note that you would have to be familiar with the REST application URLs and calls; application code, classes, methods and exceptions in order to simulate this fault.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Application Faults ---> Spring Service Exception.
Select an Endpoint.
If the Endpoint is of type Kubernetes:
Provide additional K8s arguments such as Container Name, Pod Labels and the Random Injection flag.
If the Endpoint is of type Docker:
Provide additional Docker argument such as Container Name.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide "Service URI" as /rest/api/v1/plugin if the REST URL of interest is as follows https://xxx.vmware.com/mangle-services/rest/api/v1/plugins.
Provide "Http Method" as GET, POST, PUT, PATCH or DELETE as applicable.
Provide "Exception Class" as for eg: java.lang.NullPointerException if you want a null pointer exception to be thrown.
Provide "Exception Message" as any sample message for testing purposes.
Provide additional JVM properties such as Java Home, JVM Process, Free Port and Logon User. For eg: If the application under test is a VMware application then the JRE for the application resides in a specific location so for Java Home enter string /usr/java/jre-vmware/bin/java. The JVM Process can be either the process id of the application or the JVM descriptor name. In cases where you schedule, application faults, it is preferable to specify the JVM descriptor name. The Free Port is for the Byteman agent to talk to the application, so provide one that is not in use. The Logon User should be a user who has permissions to access and run the application under test. If it is root specify that else specify the appropriate user id.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Simulate Java Exception

Java Method Exception Fault helps you simulate a condition where calls to a specific Java method can result in exceptions. Please note that you would have to be familiar with the application code; Java classes, methods and exceptions in order to simulate this fault.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Application Faults ---> Simulate Java Exception.
Select an Endpoint.
If the Endpoint is of type Kubernetes:
Provide additional K8s arguments such as Container Name, Pod Labels and the Random Injection flag.
If the Endpoint is of type Docker:
Provide additional Docker argument such as Container Name.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide "Latency" value in milliseconds so that Mangle can delay calls to the method by that time.
Provide "Class Name" as PluginController if the class of interest is defined as public class PluginController {...}.
Provide "Method Name" as getPlugins if the method to be tested is defined as follows:
public ResponseEntity> getPlugins(
@RequestParam(value = "pluginId", required = false) String pluginId, @RequestParam(value = "extensionType", required = false) ExtensionType extensionType) {
log.info("PluginController getPlugins() Start.............");
if (StringUtils.hasLength(pluginId)) {
return new ResponseEntity<>(pluginService.getExtensions(pluginId, extensionType), HttpStatus.OK);
}
return new ResponseEntity<>(pluginService.getExtensions(), HttpStatus.OK);
}
Provide "Rule Event" as "AT ENTRY" OR "AT EXIT" to specify if the fault has to be introduced in the beginning or at the end of the method call.
Provide "Exception Class" as for eg: java.lang.NullPointerException if you want a null pointer exception to be thrown.
Provide "Exception Message" as any sample message for testing purposes.
Provide additional JVM properties such as Java Home, JVM Process, Free Port and Logon User. For eg: If the application under test is a VMware application then the JRE for the application resides in a specific location so for Java Home enter string /usr/java/jre-vmware/bin/java. The JVM Process can be either the process id of the application or the JVM descriptor name. In cases where you schedule, application faults, it is preferable to specify the JVM descriptor name. The Free Port is for the Byteman agent to talk to the application, so provide one that is not in use. The Logon User should be a user who has permissions to access and run the application under test. If it is root specify that else specify the appropriate user id.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Kill JVM Fault

Kill JVM Fault helps you simulate a condition where JVM crashes with specific exit codes when calls to a specific Java method are done. Please note that you would have to be familiar with the application code; Java classes, methods and exceptions in order to simulate this fault.

Steps to follow:

Login as a user with read and write privileges to Mangle.
Navigate to Fault Execution tab ---> Application Faults ---> Kill JVM.
Select an Endpoint.
If the Endpoint is of type Kubernetes:
Provide additional K8s arguments such as Container Name, Pod Labels and the Random Injection flag.
If the Endpoint is of type Docker:
Provide additional Docker argument such as Container Name.
Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used.
Provide "Latency" value in milliseconds so that Mangle can delay calls to the method by that time.
Provide "Class Name" as PluginController if the class of interest is defined as public class PluginController {...}.
Provide "Method Name" as getPlugins if the method to be tested is defined as follows:
public ResponseEntity> getPlugins(
@RequestParam(value = "pluginId", required = false) String pluginId, @RequestParam(value = "extensionType", required = false) ExtensionType extensionType) {
log.info("PluginController getPlugins() Start.............");
if (StringUtils.hasLength(pluginId)) {
return new ResponseEntity<>(pluginService.getExtensions(pluginId, extensionType), HttpStatus.OK);
}
return new ResponseEntity<>(pluginService.getExtensions(), HttpStatus.OK);
}
Provide "Rule Event" as "AT ENTRY" OR "AT EXIT" to specify if the fault has to be introduced in the beginning or at the end of the method call.
Select an appropriate "Exit Code" from the drop down menu.
Provide additional JVM properties such as Java Home, JVM Process, Free Port and Logon User. For eg: If the application under test is a VMware application then the JRE for the application resides in a specific location so for Java Home enter string /usr/java/jre-vmware/bin/java. The JVM Process can be either the process id of the application or the JVM descriptor name. In cases where you schedule, application faults, it is preferable to specify the JVM descriptor name. The Free Port is for the Byteman agent to talk to the application, so provide one that is not in use. The Logon User should be a user who has permissions to access and run the application under test. If it is root specify that else specify the appropriate user id.
Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint.
Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Metric Providers at the time of publishing events for fault injection and remediation. They are not mandatory.
Click on Run Fault.
The user will be re-directed to the Processed Requests section under Requests & Reports tab.
If Mangle was able to successfully trigger the fault, the status of the task will change to "COMPLETED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the button against the task in the Processed Requests table.
For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes.

Relevant API Reference

For access to relevant API Swagger documentation:

Please traverse to link -----> API Documentation from the Mangle UI or access https:///mangle-services/swagger-ui.html#/fault-injection-controller

Custom Faults

Task-Extension: An example is available as HelloManglePluginTaskHelper at package com.vmware.mangle.test.plugin.helpers of mangle-plugin-skeleton. This task Helper is an implementation of AbstractRemoteCommandExecutionTaskHelper. The implementation of AbstractRemoteCommandExecutionTaskHelper is only expected to provide the implementation for below methods:

public Task init(T faultSpec) throws MangleException;

Should provide the Implementation to initialize the Task Helper for executing the Fault. And the commands required for injection/remediation of the Fault are expected to be provided here. More details on the model for providing the Command Information is explained later.

public Task init(T taskData, String injectedTaskId) throws MangleException;

Should provide the Implementation to initialize the Task Helper for executing the Fault, if the existing Task id also provided. This method will be used for executing the Remediation on a Task if the Remediation is available. This initialization is not used for task rerun or the Re-trigger.

public void executeTask(Task task) throws MangleException;

Provide the Implementation for execution steps required in addition to Implementation available in AbstractRemoteCommandExecutionTaskHelper. Plugin developer can use this interface to invoke his own implementation of Helpers for supporting his Fault across multiple endpoints supported in mangle.

protected ICommandExecutor getExecutor(Task task) throws MangleException;

Provide the Implementation for defining the Executor required for the Fault Execution. Mangle provide a default implementation of a executor for each Supported Endpoint. The Plugin user is free to use his own executor as long as he is implementing the resource as per the interface ICommandExecutor available at package com.vmware.mangle.utils;

protected void checkTaskSpecificPrerequisites(Task task) throws MangleException;

Provide the Implementation if the Fault being developed expect the test machine to be satisfying a condition for the execution. This step is separated from the Fault execution as Mangle wants to make sure the Fault execution or Remediation will not leave the user environment in a irrecoverable state due to execution of them in a non-perquisite satisfying machine.

protected void prepareEndpoint(Task task, List listOfFaultInjectionScripts) throws MangleException; Provide the Implementation if the Fault execution needs certain changes to the Test Machine before execution. Examples are Copying a binary file required to execute a fault. This step is optional for user as the predefined implementation already copies the files returned by listFaultInjectionScripts() to the remote machine.

public String getDescription(Task task);

Provide Implementation to generate description for Fault based on user inputs to help him to identify the task in future through the description. A generic implementation is already available at TaskDescriptionUtils.getDescription(task).

public List listFaultInjectionScripts(Task task);

Provide a implementation that return details of the support scrips to be copied to test machine required for executing the fault getting implemented.

Task-Extension Deep Dive: An example is available as HelloManglePluginTaskHelper at package com.vmware.mangle.plugin.tasks.impl of mangle-plugin-skeleton. This task Helper is an implementation of AbstractRemoteCommandExecutionTaskHelper. The implementation of AbstractRemoteCommandExecutionTaskHelper is only expected to provide the implementation for below methods:

public Task init(T faultSpec) throws MangleException ;

public Task init(T taskData, String injectedTaskId) throws MangleException;

public void executeTask(Task task) throws MangleException;

protected ICommandExecutor getExecutor(Task task) throws MangleException;

Provide the Implementation for defining the Executor required for the Fault Execution. Mangle provide a default implementation of a executor for each Supported Endpoint. The Plugin user should use appropriate executor as per the endpoint provided as the target. Below is the Mapping of Executors to their Endpoint Types.

REMOTE_MACHINE – SSHUtils
DOCKER - DockerCommandUtils
AWS - AWSCommandExecutor
K8s - KubernetesCommandLineClient
vCENTER - VCenterCommandExecutor

EndpointClientFactory class of mangle-task-framework can be used for initializing the appropriate Executor for Injecting the Fault as per user request.

All these executors expect the user to provide a command to be executed on the target machine with associated meta data to mark if it is executed successfully.

protected void checkTaskSpecificPrerequisites(Task task) throws MangleException;

public String getDescription(Task task);

public List listFaultInjectionScripts(Task task);

Provide an implementation that return details of the support scrips to be copied to test machine required for executing the fault getting implemented. The support files can be any file required to be placed in the target in order to execute the developed fault. All the out of the box executors is capable of copying files to the corresponding targeted endpoint and the process completes automatically by default implementation of the AbstractRemoteCommandExecutionTaskHelper, provide that the names of the files are returned through listFaultInjectionScripts() implementation.

private List getInjectionCommandInfoList(T faultSpec) {}

Provide the commands to be executed for the Fault to be Injected. The commands should be provided as List. The fields and descriptions for the CommandInfo Fields.

private String command; String value of the actual command with references to members in pool of variables will be available to executor during command execution. The types and the accessing mechanism are explained in below section.
private boolean ignoreExitValueCheck; Boolean value to find if a command execution result should consider the return value of the command execution. Can be given false where there can be possibility that the command execution need not be resulted in only success return value, but it will be based on the command output.
private List expectedCommandOutputList; List of patterns to be provided to validate a command execution output to consider if the execution is success. The relation among the patterns verification is defaulted to logical ‘or’.
private int noOfRetries; Retries to be attempted by the executor before marking the command execution as a Failure.
private int retryInterval; Interval in seconds between any two attempts of a command execution incase of execution failures and opted for retry attempts.
private int timeout; Timeout interval in milliseconds to consider a command execution failure if the response was not received by the executor from the target.
private Map knownFailureMap; Mapping of Patterns to be looked for in the command execution output, to provide easier troubleshooting messages to user by masking stack traces in the result.
private List commandOutputProcessingInfoList; Explained in detailed below.

public class CommandOutputProcessingInfo
Fields are
1. private String regExpression;
  Regular Expression Pattern to be used to collect an crucial information from current command’s execution to make it available throughout the Fault execution.
2. private String extractedPropertyName;
  Name should be given to the collected information using the pattern given as regExpression
Types of Variables and Their Usage:
The information provided by the user or collected during the runtime of Fault are made available to command executor as below types of Variables.
1. TaskTroubleShootingInfo of the Task holds the extracted information from the command execution Output.
2. args field of CommandExecutionFaultSpec available as taskData in Task holds the data received from the user as args.
3. $FI_ADD_INFO_FieldName can be used to refer to variables from TaskTroubleShootingInfo
4. $FI_ARG_Fieldname can be used to refer to variables from args.
5. $FI_STACK can be used to refer to the output of the previous command.
private List getRemediationCommandInfoList(T faultSpec) {}
Provide the commands to for remediating the fault already Injected. The semantics of CommandInfo is same as it described in the previous section. The args and TaskTroubleShootingInfo collected during the injection will be available during the execution of remediation as well. Hence the dependency data from injection task can be passed to remediation by using the References in the commands.

Requests and Reports

Request & Reports page provides insight to the tasks running during fault execution, fault remediation and triggering of scheduled jobs. Mangle creates tasks that transition to one of the stages : NOT STARTED, IN_PROGRESS, COMPLETED, FAILED.

Processed Requests

It provides details of the tasks executed by Mangle.

Important fields of Mangle tasks

Task Name: Name of the task created for any fault execution, remediation or schedule.
Status: Will reflect one of the Stages : NOT STARTED, IN_PROGRESS, COMPLETED, FAILED.
Endpoint Name: Name of the targeted endpoint during fault execution.
Task Type: Type of the task executed. For eg: INJECTION or REMEDIATION
Task Description: You can get more details about the fault, fault parameters, endpoint targeted, targeted component within an endpoint etc form this field.
Start Time: Task trigger time
End Time: Task end time

Supported operations for Mangle tasks

Click on to understand what operations are supported for a specific task.

Primarily, the operations supported are Delete, Remediate Fault and Report.

Remediate Fault option will be enabled only if the the task type is INJECTION and status is set to COMPLETED.

Delete is not supported for tasks created through scheduled jobs.

Refreshing the Mangle task data grid

Click on refresh icon to sync Mangle task data grid with the current status.

Scheduled Jobs

Scheduled Jobs data grid lists all the schedules available on Mangle.

Important fields of schedules

ID: Contains id of the schedule.
Job Type: Type of the schedule. For eg: CRON, SIMPLE
Scheduled At: Recurrence and Time at which the schedule will be triggered. If job type is CRON, it shows a cron expression and if the job type is SIMPLE, it shows the epoch time in milliseconds.
Status: Status of the schedule. Will reflect one of the values: INITIALIZING, CANCELLED, SCHEDULED, FINISHED, PAUSED, SCHEDULE_FAILED

Triggers of each schedule

Click on the ID link of each schedule to view all the triggers of that schedule.

Supported operations for Mangle schedules

Click on to understand what operations are supported for a Scheduled Job.

Primarily, the operations supported are Cancel, Pause, Resume, Reports, Delete, and Delete Schedule Only.

Refreshing the schedule data grid

Click on refresh icon to sync Mangle schedule data grid with the current status.

Logs

Click on the Logs link to open up a browser window displaying the current Mangle application log.

Relevant API Reference

For access to relevant API Swagger documentation:

Please traverse to link -----> API Documentation from the Mangle UI or access https:///mangle-services/swagger-ui.html#/scheduler-controller