Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Target not available #690

Closed
wants to merge 88 commits into from

Conversation

allwyn-pradip
Copy link
Contributor

This would fix the priority not available when creating a plan

diegocepedaw and others added 30 commits May 28, 2021 16:05
…ification (linkedin#618)

* create endpoint to build and render messages

* update sender tests
* change cache plan delete log level (linkedin#619)

* skip error logging if iris bot fails to send messages to a channel it is not in (linkedin#621)

* skip error logging if iris bot fails to send messages to a channel it is not in

* removed redundant slack api call and added a warning

* added a return statement to keep the event from being counted as an error in the metrics

Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz>

* rackspace webhook support (linkedin#405)

* Start rackspace webhook impl by copying alertmanager

* Plumb plan name through URL parameter for rackspace webhook

* add test for rackspace webhook (WIP)

* fix rackspace webhook test data

* rackspace webhook should parse plan inside its class

This detail is only relevant to the rackspace webhook class and doesn't belong
in api.py. This just reverts the changes in api.py and adds support for parsing
the plan in the rackspace webhook class.

* deduplicate webhook code with class inheritance

authored-by: Patrick Baxter <pb@coreos.com>

* Add support for custom Slack formatting via attachments/blocks (linkedin#624)

* Update iris_slack.py

* Update iris_slack.py

* bumnp version

* fix tests and incident id insert in slack message

* remove extra arguments (linkedin#626)

* added twillio number override mechanism (linkedin#627)

* skip error logging if iris bot fails to send messages to a channel it is not in

* removed redundant slack api call and added a warning

* added a return statement to keep the event from being counted as an error in the metrics

* added twillio number override mechanism

* minor change in case application_override_mapping is not defined in the default config

Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz>

Co-authored-by: ddurruty-li <85372760+ddurruty-li@users.noreply.github.com>
Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz>
Co-authored-by: Patrick Baxter <patrickbx@gmail.com>
Co-authored-by: Luke Young <bored-engineer@users.noreply.github.com>
* change cache plan delete log level (linkedin#619)

* skip error logging if iris bot fails to send messages to a channel it is not in (linkedin#621)

* skip error logging if iris bot fails to send messages to a channel it is not in

* removed redundant slack api call and added a warning

* added a return statement to keep the event from being counted as an error in the metrics

Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz>

* rackspace webhook support (linkedin#405)

* Start rackspace webhook impl by copying alertmanager

* Plumb plan name through URL parameter for rackspace webhook

* add test for rackspace webhook (WIP)

* fix rackspace webhook test data

* rackspace webhook should parse plan inside its class

This detail is only relevant to the rackspace webhook class and doesn't belong
in api.py. This just reverts the changes in api.py and adds support for parsing
the plan in the rackspace webhook class.

* deduplicate webhook code with class inheritance

authored-by: Patrick Baxter <pb@coreos.com>

* Add support for custom Slack formatting via attachments/blocks (linkedin#624)

* Update iris_slack.py

* Update iris_slack.py

* bumnp version

* fix tests and incident id insert in slack message

* remove extra arguments (linkedin#626)

* added twillio number override mechanism (linkedin#627)

* skip error logging if iris bot fails to send messages to a channel it is not in

* removed redundant slack api call and added a warning

* added a return statement to keep the event from being counted as an error in the metrics

* added twillio number override mechanism

* minor change in case application_override_mapping is not defined in the default config

Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz>

* remove experimental-sender changes from master

* Update __init__.py

* check for active mailing_list target (linkedin#630)

* use app's category default if user's not defined

* Iris-message-processor cluster management

* incorporate suggestions

* remove redundant line

* remove internal app allowlist

* Update __init__.py

* remove non-inclusive language (linkedin#639)

* remove non-inclusive language (linkedin#635)

* master -> leader in config (linkedin#637)

* remove non-inclusive language

* master -> leader in config

* restore sphinx required master_doc (linkedin#638)

* remove non-inclusive language

* master -> leader in config

* restore sphinx required master_doc

Co-authored-by: ddurruty-li <85372760+ddurruty-li@users.noreply.github.com>
Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz>
Co-authored-by: Patrick Baxter <patrickbx@gmail.com>
Co-authored-by: Luke Young <bored-engineer@users.noreply.github.com>
* include custom sender addresses for build_message

* add endpoint to fetch plan aggregation settings
…kedin#656)

* return device ids with target contact

* support multi-recipient msgs in external sender
* return device ids with target contact

* support multi-recipient msgs in external sender

* build messages for dynamic plans

* remove unused variables
* return device ids with target contact

* support multi-recipient msgs in external sender

* build messages for dynamic plans

* remove unused variables

* link ui to external sender if enabled

* fix wording of comment

* add external sender into incident target search
* return device ids with target contact

* support multi-recipient msgs in external sender

* build messages for dynamic plans

* remove unused variables

* link ui to external sender if enabled

* fix wording of comment

* add external sender into incident target search

* forward twilio deliver status to external sender

* handle external message responses

* set X-IRIS-INCIDENT header
* still display incidents if sender can't be reached

* add ecternal sender peer count endpoint

* flake8
linkedin#671)

* still display incidents if sender can't be reached

* add ecternal sender peer count endpoint

* flake8

* split external incident & notification hadling cfg
* change cache plan delete log level (linkedin#619)

* skip error logging if iris bot fails to send messages to a channel it is not in (linkedin#621)

* skip error logging if iris bot fails to send messages to a channel it is not in

* removed redundant slack api call and added a warning

* added a return statement to keep the event from being counted as an error in the metrics

Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz>

* rackspace webhook support (linkedin#405)

* Start rackspace webhook impl by copying alertmanager

* Plumb plan name through URL parameter for rackspace webhook

* add test for rackspace webhook (WIP)

* fix rackspace webhook test data

* rackspace webhook should parse plan inside its class

This detail is only relevant to the rackspace webhook class and doesn't belong
in api.py. This just reverts the changes in api.py and adds support for parsing
the plan in the rackspace webhook class.

* deduplicate webhook code with class inheritance

authored-by: Patrick Baxter <pb@coreos.com>

* Add support for custom Slack formatting via attachments/blocks (linkedin#624)

* Update iris_slack.py

* Update iris_slack.py

* bumnp version

* fix tests and incident id insert in slack message

* remove extra arguments (linkedin#626)

* added twillio number override mechanism (linkedin#627)

* skip error logging if iris bot fails to send messages to a channel it is not in

* removed redundant slack api call and added a warning

* added a return statement to keep the event from being counted as an error in the metrics

* added twillio number override mechanism

* minor change in case application_override_mapping is not defined in the default config

Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz>

* remove experimental-sender changes from master

* Update __init__.py

* check for active mailing_list target (linkedin#630)

* use app's category default if user's not defined

* remove internal app allowlist

* Update __init__.py

* remove non-inclusive language (linkedin#639)

* remove non-inclusive language (linkedin#635)

* master -> leader in config (linkedin#637)

* remove non-inclusive language

* master -> leader in config

* restore sphinx required master_doc (linkedin#638)

* remove non-inclusive language

* master -> leader in config

* restore sphinx required master_doc

* db setup instruction (linkedin#646)

the `schema_0.sql` contains `drop table` statement which could be not obvious.

I experienced this issue when using a docker image with `DOCKER_DB_BOOTSTRAP=1` (which does a schema and dummy data), and wondering why I lost all my custom plans and templates, after iris container restart.

Additionally provide alternative way to removing ONLY_FULL_GROUP_BY. This is useful when running mysql docker images. I.e. from oracla/mysql or bitnami mariadb images. This is easier than modifying the config file, or changing global server configuration when running in the container (i.e. docker or kubernetes).

* use DB port from the config in image entrypoint (linkedin#650)

* until now, 3306 was hardcoded
* port in config file stays optional for backward compatibility

Co-authored-by: Michal Zubac <michal.zubac@inuits.eu>

* rebasing container image to ubuntu:20.04 (linkedin#649)

* rebasing container image to ubuntu:20.04

* switching to Python v3
  * which was driven by rsa>=3.1.4 -> oauth2client==1.4.12, it no longer supports Python v2
* adding ops/packer Makefile
  * to automate steps only found in README.md
  * done just for the docker image build
* instructing packer to clear ENTRYPOINT from original ubuntu image
  * CMD was not working with the /bin/sh in place of ENTRYPOINT

* fixes to actually make it work with Python 3.8

* correct plugin references for uwsgi
* fix of execv inputs to match validation, which changed in Python 3.6
* added missing mysql client package, which is used in the entrypoint
* not using 'gevent' parameter for uwsgi
  * it causes "DAMN ! worker N (pid: NNN) died :( trying respawn ..."
  * not sure why this happens for uwsgi+python3+gevent

Co-authored-by: Michal Zubac <michal.zubac@inuits.eu>

* add max len to webhook context too long response (linkedin#653)

* Update __init__.py

* Create __init__.py

* Support multiple custom incident handler hooks (linkedin#664)

Refactor handling of custom incident handlers to support multiple handlers.

Retain compatibility with previous config syntax

* Calculate incident priority for incident creation hook

- Generate priority by looking for the highest severity priority across
  all of the plan notifications

- Set this as a new priority field within the incident details as passed
  to the process_create() hook

* Make application name in hook incident data support overwritten app name

When creating test incidents within the API, the application given to the
incident creation hooks is "iris" rather than the application being used.

Fix this such that the hooks receive the overwritten application name.

Also: move definition of `app` to the parent block in the function so it
is more obvious this variable is used later on.

* Don't prepend https:// to proxy hostnames (linkedin#668)

Historically, this approach has always worked because http forward proxies
generally only listen on http:// (not https://) and urllib3 has not supported
connecting to a http proxy via https:// so it has always ignored the scheme.

However, as of urllib3 >= 1.26 or so, urllib3 does support and attempt connecting
to proxies via https:// (if this schem is provided) and it raises an exception
if the proxy only listens on http://

Fix this by no longer enforcing a http:// prefix to proxy hostnames. If the user
desires connecting to a https:// proxy, this prefix can be provided within Iris's
configuration.

* remove old custom_incident_handler_dispatcher

Co-authored-by: ddurruty-li <85372760+ddurruty-li@users.noreply.github.com>
Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz>
Co-authored-by: Patrick Baxter <patrickbx@gmail.com>
Co-authored-by: Luke Young <bored-engineer@users.noreply.github.com>
Co-authored-by: Witold Baryluk <witold.baryluk+github@gmail.com>
Co-authored-by: mighq <contact@mighq.net>
Co-authored-by: Michal Zubac <michal.zubac@inuits.eu>
Co-authored-by: Joe Gillotti <joe@u13.net>
Co-authored-by: Joe Gillotti <jgillotti@linkedin.com>
* change cache plan delete log level (linkedin#619)

* skip error logging if iris bot fails to send messages to a channel it is not in (linkedin#621)

* skip error logging if iris bot fails to send messages to a channel it is not in

* removed redundant slack api call and added a warning

* added a return statement to keep the event from being counted as an error in the metrics

Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz>

* rackspace webhook support (linkedin#405)

* Start rackspace webhook impl by copying alertmanager

* Plumb plan name through URL parameter for rackspace webhook

* add test for rackspace webhook (WIP)

* fix rackspace webhook test data

* rackspace webhook should parse plan inside its class

This detail is only relevant to the rackspace webhook class and doesn't belong
in api.py. This just reverts the changes in api.py and adds support for parsing
the plan in the rackspace webhook class.

* deduplicate webhook code with class inheritance

authored-by: Patrick Baxter <pb@coreos.com>

* Add support for custom Slack formatting via attachments/blocks (linkedin#624)

* Update iris_slack.py

* Update iris_slack.py

* bumnp version

* fix tests and incident id insert in slack message

* remove extra arguments (linkedin#626)

* added twillio number override mechanism (linkedin#627)

* skip error logging if iris bot fails to send messages to a channel it is not in

* removed redundant slack api call and added a warning

* added a return statement to keep the event from being counted as an error in the metrics

* added twillio number override mechanism

* minor change in case application_override_mapping is not defined in the default config

Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz>

* remove experimental-sender changes from master

* Update __init__.py

* check for active mailing_list target (linkedin#630)

* use app's category default if user's not defined

* remove internal app allowlist

* Update __init__.py

* remove non-inclusive language (linkedin#639)

* remove non-inclusive language (linkedin#635)

* master -> leader in config (linkedin#637)

* remove non-inclusive language

* master -> leader in config

* restore sphinx required master_doc (linkedin#638)

* remove non-inclusive language

* master -> leader in config

* restore sphinx required master_doc

* db setup instruction (linkedin#646)

the `schema_0.sql` contains `drop table` statement which could be not obvious.

I experienced this issue when using a docker image with `DOCKER_DB_BOOTSTRAP=1` (which does a schema and dummy data), and wondering why I lost all my custom plans and templates, after iris container restart.

Additionally provide alternative way to removing ONLY_FULL_GROUP_BY. This is useful when running mysql docker images. I.e. from oracla/mysql or bitnami mariadb images. This is easier than modifying the config file, or changing global server configuration when running in the container (i.e. docker or kubernetes).

* use DB port from the config in image entrypoint (linkedin#650)

* until now, 3306 was hardcoded
* port in config file stays optional for backward compatibility

Co-authored-by: Michal Zubac <michal.zubac@inuits.eu>

* rebasing container image to ubuntu:20.04 (linkedin#649)

* rebasing container image to ubuntu:20.04

* switching to Python v3
  * which was driven by rsa>=3.1.4 -> oauth2client==1.4.12, it no longer supports Python v2
* adding ops/packer Makefile
  * to automate steps only found in README.md
  * done just for the docker image build
* instructing packer to clear ENTRYPOINT from original ubuntu image
  * CMD was not working with the /bin/sh in place of ENTRYPOINT

* fixes to actually make it work with Python 3.8

* correct plugin references for uwsgi
* fix of execv inputs to match validation, which changed in Python 3.6
* added missing mysql client package, which is used in the entrypoint
* not using 'gevent' parameter for uwsgi
  * it causes "DAMN ! worker N (pid: NNN) died :( trying respawn ..."
  * not sure why this happens for uwsgi+python3+gevent

Co-authored-by: Michal Zubac <michal.zubac@inuits.eu>

* add max len to webhook context too long response (linkedin#653)

* Update __init__.py

* Create __init__.py

* Support multiple custom incident handler hooks (linkedin#664)

Refactor handling of custom incident handlers to support multiple handlers.

Retain compatibility with previous config syntax

* Calculate incident priority for incident creation hook

- Generate priority by looking for the highest severity priority across
  all of the plan notifications

- Set this as a new priority field within the incident details as passed
  to the process_create() hook

* Make application name in hook incident data support overwritten app name

When creating test incidents within the API, the application given to the
incident creation hooks is "iris" rather than the application being used.

Fix this such that the hooks receive the overwritten application name.

Also: move definition of `app` to the parent block in the function so it
is more obvious this variable is used later on.

* Don't prepend https:// to proxy hostnames (linkedin#668)

Historically, this approach has always worked because http forward proxies
generally only listen on http:// (not https://) and urllib3 has not supported
connecting to a http proxy via https:// so it has always ignored the scheme.

However, as of urllib3 >= 1.26 or so, urllib3 does support and attempt connecting
to proxies via https:// (if this schem is provided) and it raises an exception
if the proxy only listens on http://

Fix this by no longer enforcing a http:// prefix to proxy hostnames. If the user
desires connecting to a https:// proxy, this prefix can be provided within Iris's
configuration.

* remove old custom_incident_handler_dispatcher

* process ldap and oncall syncs concurrently (linkedin#677)

Co-authored-by: ddurruty-li <85372760+ddurruty-li@users.noreply.github.com>
Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz>
Co-authored-by: Patrick Baxter <patrickbx@gmail.com>
Co-authored-by: Luke Young <bored-engineer@users.noreply.github.com>
Co-authored-by: Witold Baryluk <witold.baryluk+github@gmail.com>
Co-authored-by: mighq <contact@mighq.net>
Co-authored-by: Michal Zubac <michal.zubac@inuits.eu>
Co-authored-by: Joe Gillotti <joe@u13.net>
Co-authored-by: Joe Gillotti <jgillotti@linkedin.com>
diegocepedaw and others added 20 commits March 29, 2022 10:48
* use locks with api cache

* add ability to filter incidents by claimed

* use gevent lock directly
* use locks with api cache

* add ability to filter incidents by claimed

* use gevent lock directly

* import gevent.lock explicitly in cache
* use locks with api cache

* add ability to filter incidents by claimed

* use gevent lock directly

* import gevent.lock explicitly in cache

* use gevent BoundedSemaphore instead of Semaphore
* use locks with api cache

* add ability to filter incidents by claimed

* use gevent lock directly

* import gevent.lock explicitly in cache

* use gevent BoundedSemaphore instead of Semaphore

* init api cache immediately
* use locks with api cache

* add ability to filter incidents by claimed

* use gevent lock directly

* import gevent.lock explicitly in cache

* use gevent BoundedSemaphore instead of Semaphore

* init api cache immediately

* set iris client init log to debug
@bilbof bilbof mentioned this pull request Apr 11, 2022
bilbof added a commit to bilbof/iris that referenced this pull request Apr 11, 2022
The api makes use of [gevent], a coroutine based networking
library which relies heavily on monkey patching the stdlib.

From the [gevent.monkey] docs:
> Warning Patching too late can lead to unreliable behaviour
> (for example, some modules may still use blocking sockets) or even errors.

This appears to have happened here. Thanks to @allwyn-pradip for
pointing me at the right file in PR
linkedin#690.

Resolves linkedin#686, linkedin#699, linkedin#644.

Blog on gevent: https://eng.lyft.com/what-the-heck-is-gevent-4e87db98a8
> In the case of gevent — monkey patching has to be the absolute first thing a process does

[gevent]: https://www.gevent.org/index.html
[gevent.monkey]: https://www.gevent.org/api/gevent.monkey.html
diegocepedaw pushed a commit that referenced this pull request Apr 11, 2022
The api makes use of [gevent], a coroutine based networking
library which relies heavily on monkey patching the stdlib.

From the [gevent.monkey] docs:
> Warning Patching too late can lead to unreliable behaviour
> (for example, some modules may still use blocking sockets) or even errors.

This appears to have happened here. Thanks to @allwyn-pradip for
pointing me at the right file in PR
#690.

Resolves #686, #699, #644.

Blog on gevent: https://eng.lyft.com/what-the-heck-is-gevent-4e87db98a8
> In the case of gevent — monkey patching has to be the absolute first thing a process does

[gevent]: https://www.gevent.org/index.html
[gevent.monkey]: https://www.gevent.org/api/gevent.monkey.html
@bilbof
Copy link
Contributor

bilbof commented Apr 11, 2022

@allwyn-pradip you should be able to close this PR now that #703 is in. Thanks - this PR showed how to fix the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants