forked from apache/pulsar
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pulsar-function-go] (WIP) Add statistics and Prometheus to Go Function instances for production readiness #2
Open
devinbost
wants to merge
134
commits into
go-features
Choose a base branch
from
go-features-prometheus
base: go-features
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ng (apache#6101) *Motivation* Related to apache#6084 apache#5400 introduces `customRuntimeOptions` in function details. But the description was wrong. The mistake was probably introduced by bad merges. *Modification* Fix the argument and description for `deadletterTopic` and `customRuntimeOptions`.
### Motivation Pulsar dashboard is deprecated. We need to remove it from sidebar. ### Modifications 1. Remove Pulsar dashboard from the sidebar (for both master and 2.5.0 release); 2. Add related note and link to Pulsar dashboard in Pulsar manager.
…pache#6063) ### Motivation Some of the classes in the pulsar-functions module had a mixture of the following lombok annotations: ``` @DaTa @Setter @getter @EqualsAndHashCode @tostring ``` The [@DaTa](https://projectlombok.org/features/Data) annotation includes all other annotations: > All together now: A shortcut for @tostring, @EqualsAndHashCode, @getter on all fields, @Setter on all non-final fields, and @requiredargsconstructor! ### Modifications Removed `@Setter`, `@Getter`, `@EqualsAndHashCode`, and '@tostring' if the `@Data` annotation was also present
…ation on how to work with schemas (apache#6089) ### Motivation When I read the documentation on how to work with schemas I saw that some code blocks are not highlighted, because of incorrect language identifiers. ### Modifications Correct language identifiers were applied.
…pache#5915) * Allow to enable/disable delyed delivery for messages on namespace Signed-off-by: xiaolong.ran <rxl@apache.org> * add isDelayedDeliveryEnabled function Signed-off-by: xiaolong.ran <rxl@apache.org> * add delayed_delivery_time process logic Signed-off-by: xiaolong.ran <rxl@apache.org> * add test case Signed-off-by: xiaolong.ran <rxl@apache.org> * update admin cli docs Signed-off-by: xiaolong.ran <rxl@apache.org> * fix comments Signed-off-by: xiaolong.ran <rxl@apache.org> * fix comments Signed-off-by: xiaolong.ran <rxl@apache.org> * fix comments Signed-off-by: xiaolong.ran <rxl@apache.org> * update import lib Signed-off-by: xiaolong.ran <rxl@apache.org> * avoid import * Signed-off-by: xiaolong.ran <rxl@apache.org> * fix comments Signed-off-by: xiaolong.ran <rxl@apache.org> * fix comments Signed-off-by: xiaolong.ran <rxl@apache.org> * remove unuse code Signed-off-by: xiaolong.ran <rxl@apache.org> * fix comments Signed-off-by: xiaolong.ran <rxl@apache.org> * add test case for delayed delivery messages Signed-off-by: xiaolong.ran <rxl@apache.org> * fix comments Signed-off-by: xiaolong.ran <rxl@apache.org> * fix comments Signed-off-by: xiaolong.ran <rxl@apache.org>
Fixes apache#6114 Modifications The swagger files were generated for the version 2.5.0 according to the instruction provided by @tuteng
…pache#6095) Fixes apache#6094 ### Motivation Incorrect picture in pulsar function-overview ### Modifications Fix the last word "in" to "pudding".
…ache#6050) Fixes apache#3681 ### Motivation The client library doc content is out of date. ### Modifications 1. Pulsar supports more clients, add links to the new clients. 2. Remove redundant information, and will update each client respectively. 3. Update Pulsar client go info. 4. Proofread parts of the Python/Go client content while reading. 5. Fixes apache#3681 by adding examples in Python client. 6. Update Python API.
Fixes apache#5620 Added support for Python native logging within the Pulsar Client Python wrapper. * added a new Logger implementation that forwards the logging to python * updated the python module to allow setting the logger on a `pulsar.Client` via the `ClientConfiguration` object
…a key (apache#6065) Fixes apache#6056 ### Motivation There is no doc telling users how to choose partition when you use a key. ### Modifications Add example on how to choose partition when using a key.
I noticed the documentation around batching didn't clarify some important details about how batching works in Pulsar, and particularly how it interacts with acknowledgements and redeliveries. This is my attempt to provide some clarity.
…Pulsar standalone (apache#5998)
…r configurable. (apache#6028) Fixes apache#5690 ### Motivation The default type name cannot be compatible with Elasticsearch versions both before 6.2 and after 7.0, so this change makes it configurable. ### Modifications Make type name configurable, and set the default value to "_doc" to make it compatible with Elasticsearch versions after 6.2 by default. ### Verifying this change Added unit tests and local integration tests.
* [CPP] Compile fixes for ARM cpus * Fixed formatting
Fixed: apache#6115 apache#6110 apache#6108 apache#6100 ### Motivation There are some errors in the content of the current document due to the adjustment of the architecture, so we will fix it. ### Modifications * The display is inconsistent with the downloaded content. * Document display error * Add io-use.md for version 2.5.0
### Motivation Available permits of ZeroQueueConsuemer must be 1 or less, however ZeroQueueConsuemer using listener may be greater than 1. ### Modifications If listener is processing message, ZeroQueueConsumer doesn't send permit when it reconnect to broker. ### Reproduction 1. ZeroQueueConsuemer using listener consume a topic. 2. Unload that topic( or restart a broker) when listener is processing message. 3. ZeroQueueConsumer sends permit when it reconnect to broker. https://github.com/apache/pulsar/blob/v2.5.0/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ZeroQueueConsumerImpl.java#L133 4. ZeroQueueConsumer also sends permit when finished processing message. https://github.com/apache/pulsar/blob/v2.5.0/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ZeroQueueConsumerImpl.java#L163 5. Available permits become 2.
* Fix typo in Reader documentation. Proker -> Broker. * update copy to be consistent with rest of docs
### Motivation Current help description of `bin/pulsar tokens` is redundancy when only specified one subcommand. ### Modifications Print corresponding usage according to the parsed command.
Fix some comment typos in the pulsar-broker module, no real code changes.
… used (apache#5883) Co-authored-by: Sijie Guo <guosijie@gmail.com>
) *Motivation* Fixes apache#5997 Fixes apache#6079 A regression was introduced in apache#5486. If websocket service as running as part of pulsar standalone, the cluster data is set with null service urls. This causes service url is not set correctly in the pulsar client and an illegal argument exception ("Param serviceUrl must not be blank.") will be thrown. *Modifications* 1. Pass `null` when constructing the websocket service. So the local cluster data can be refreshed when creating pulsar client. 2. Set the cluster data after both broker service and web service started and ports are allocated.
### Motivation *A change that had to be moved to 2.5.0 (because it [changed the .proto file](apache#5230 (comment))), is still listed as having been done in 2.4.2 in the release notes.* ### Modifications *Move the change from 2.4.2 to 2.5.0.*
### Motivation *Parts of the docs for Pulsar Functions are redundant.* ### Modifications *Remove parts of the docs for Pulsar Functions that are redundant.*
Fixes apache#5116 ### Motivation Based on the request in apache#5116, adding this flag can help reduce ambiguity and increase flexibility. By default, the function passes source message properties. ### Modifications - add flags in pulsar admin client tools - update functions proto and functionConfig to contain the flag - update the pulsar sink to pass source message property if the flag is set
…apache#6315) Fixes #apache#6314 ### Motivation Pulsar Manager do not work if Pulsar authentication is enabled. ### Modifications pulsar-manager-configmap.yaml was created in order to allow configuration of the enviroment properties in values.yaml
…pache#6112) Fixes apache#6111 ### Motivation Add Debezium Mongodb for io-connectors . ### Modifications
…ns to deploy functions (apache#6191) ### Motivation Fixes feature enhancement request apache#6143: Currently, there are quite a few undocumented steps that are needed to be performed manually in order to make sure that the functions can be submitted as pods in K8s runtime environment. It would be much better if this process would be automated. #### Proposed solution: Automate this process via helm install and update the helm charts with templates. ### Modifications I've added an additional `functionsAsPods` filed in extra components inside the values file. If the setting is set to `yes`, then it would add `serviceAccount` to the broker deployment. It will also add the rbac policy to give the brokers permissions to deploy functions. The policies can be found in the new `broker-rbac.yaml` template file. Moreover, it will also change the `functions_worker` settings and set the function runtime factory setting that can be found inside `broker-configmap.yaml`. ### Verifying this change 1) Set `functionsAsPods: yes` inside helm values yaml file. 2) Follow the instructions on how deploying helm and run: `helm install pulsar --values pulsar/values-mini.yaml ./pulsar/`. 3) Wait until all the services are up and running. 4) Set up tenant, namespace. 5) Create a function, sink and source and submit it using the CLI to make sure the pods are running alongside the Pulsar cluster. In addition, set up such a flow where the data is flowing from source to topics, the processed by a function and sink outputs the data 6) Push data into cluster through the source and make sure it comes out of the sink into destination. There shouldn't be any errors in the logs of brokers, bookie, sources, sinks and functions. #### Modules affected: The changes in the PR are affecting the deployment using the helm charts. Now the if the flag `functionsAsPods` is set to `yes` inside the `values.yaml. file, the functions would run as pods. ### Documentation Currently, the documentations explaining the helm chart deployment process is lacking and this should be updated.
### Motivation Currently, bundle split splits the bundle into two parts of the same size. When there are fewer topics, bundle split does not work well. The topic assigned to the broker according to the topic name hash value, hashing is not effective in a small number of topics bundle split. So, this PR introduces an option(-balance-topic-count) for bundle split. When setting it to true, the given bundle splits to 2 parts, each part has the same amount of topics. And introduce a new Load Manager implementation named `org.apache.pulsar.broker.loadbalance.impl.BalanceTopicCountModularLoadManager`. The new Load Manager implementation splits bundle with balance topics count, others are not different from ModularLoadManagerImpl.
Fix the misleading callback examples for send_async after apache#4811
`avroSchema.getSchemaInfo().getSchema()` in `src/test/java/org/apache/pulsar/client/impl/schema/AvroSchemaTest.java` returns a JSON object. `schemaJson` compares with hard-coded JSON String. However, the order of entries in `schemaJson` is not guaranteed so the json results can contain entries in any order. This PR proposes to use JSONAssert and modify the corresponding json test assertions so that the test is more stable. ### Motivation Using JSONAssert and modifying the corresponding json test assertions so that the test is more stable. ### Modifications Adding `assertJSONEquals` method and replacing `assertEquals` with it in tests `testNotAllowNullSchema` and `testAllowNullSchema`.
*Motivation* `PULSARBOT_TOKEN` is configured in apache/pulsar to trigger CI runs
…g Pulsar client (apache#6277) * Attempt at fixing deadlock during client.close() * Fixed formatting * Detach the worker thread in the destructor of ExecutorService if it is still unable to be joined * Possible formatting fixes
…ial duplicated messages and non-duplicated messages into a batch. (apache#6326) Fixes apache#6273 Motivation The main reason for apache#6273 is combining potential duplicated messages and non-duplicated messages into a batch. So need to flush the potential duplicated message first and then add the non-duplicated messages to a batch.
Upgrade ZK to latest stable version. In particular we need to include: - Split brain on log disk full https://issues.apache.org/jira/browse/ZOOKEEPER-3701 - Data loss after upgrading standalone ZK server 3.4.14 to 3.5.6 with snapshot.trust.empty=true https://issues.apache.org/jira/browse/ZOOKEEPER-3644
### Motivation In pulsar 2.5.0 deploying window functions fails because its class doesn't pass validation. The behavior looks the same in current master. ### Modifications Add `WindowFunction.class` to the list of allowed function classes
…pache#6187) Fixes apache#5904 ### Motivation Pulsar supports unload a non-partitioned-topic or a partition of a partitioned topic. If there has a partitioned topic with too many partitions, users need to get all partition and unload them one by one. We need to support unload all partition of a partitioned topic.
…ctions for production-readiness (apache#6104) This PR is to provide integration tests that test execution of Go functions that are managed by the Java FunctionManager. This will allow us to test things like behavior during function timeouts, heartbeat failures, and other situations that can only be effectively tested in an integration test. Master issue: apache#4175 Fixes issue: apache#6204 ### Modifications We must add Go to the integration testing logic. We must also build the Go dependencies into the test Dockerfile to ensure the Go binaries are available at runtime for the integration tests.
…6027) ### Motivation Fixes apache#5999 ### Modifications Add the logic to handle the blank cluster name.
…er OOM (apache#6178) Motivation Introduce maxMessagePublishBufferSizeInMB configuration to avoid broker OOM. Modifications If the processing message size exceeds this value, the broker will stop read data from the connection. When available size > half of the maxMessagePublishBufferSizeInMB, start auto-read data from the connection.
…che#6310) Fixes apache#6045 apache#6281 ### Motivation Enable get precise backlog and backlog without delayed messages. ### Verifying this change Added new unit tests for the change.
Fixes apache#5560 ### Motivation Currently, Pulsar SQL can't read the keyValue schema data. This PR added support Pulsar SQL reading messages with a key-value schema. ### Modifications Add KeyValue schema support for Pulsar SQL. Add prefix __key. for the key field name.
apache#6339) Motivation To avoid get partition metadata while the topic name is a partition name. Currently, if users want to skip all messages for a partitioned topic or unload a partitioned topic, the broker will call get topic metadata many times. For a topic with the partition name, it is not necessary to call get partitioned topic metadata again.
…aml (apache#6340) Fixes apache#6338 ### Motivation This commit started while I was using helm in my local minikube, noticed that there's a mismatch between `values-mini.yaml` and `values.yaml` files. At first I thought it was a copy/paste error. So I created apache#6338; Then I looked into the details how these env-vars[ were used](https://github.com/apache/pulsar/blob/28875d5abc4cd13a3e9cc4f59524d2566d9f9f05/conf/bkenv.sh#L36), found out its ok to use `PULSAR_MEM` as an alternative. But it introduce problems: 1. Since `BOOKIE_GC` was not defined , the default [BOOKIE_EXTRA_OPTS](https://github.com/apache/pulsar/blob/28875d5abc4cd13a3e9cc4f59524d2566d9f9f05/conf/bkenv.sh#L39) will finally use default value of `BOOKIE_GC`, thus would cover same the JVM parameters defined prior in `PULSAR_MEM`. 2. May cause problems when bootstrap scripts changed in later dev, better to make it explicitly. So I create this pr to solve above problems(hidden trouble). ### Modifications As mentioned above, I've made such modifications below: 1. make `BOOKIE_MEM` and `BOOKIE_GC` explicit in `values-mini.yaml` file. Keep up with the format in`values.yaml` file. 2. remove all print-gc-logs related args. Considering the resource constraints of minikube environment. The removed part's content is `-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -verbosegc -XX:G1LogLevel=finest` 3. leave `PULSAR_PREFIX_dbStorage_rocksDB_blockCacheSize` empty as usual, as [conf/standalone.conf#L576](https://github.com/apache/pulsar/blob/df152109415f2b10dd83e8afe50d9db7ab7cbad5/conf/standalone.conf#L576) says it would to use 10% of the direct memory size by default.
The key shared policy does not support setting the maximum key hash range, so fix the java doc.
…omission of methods for gRPC server registration in generated gRPC files for Go. (apache#4175) Generated updated gRPC files that contain service registration methods for creating gRPC service in Go. Also, upgraded proto version to 3. (apache#4175) Fixed build errors by prefixing pulsar-function-go/pb with pb alias. (apache#4175). Added instanceControlServicer.go as the servicer responsible for serving the gRPC service for the Go Function instances (apache#4175). Rough draft right now. Added changes to show intent behind passing port value to Start in function.go. Also, added some code to support healthcheck and added methods to support instanceConrolServicer. Just needed to commit changes to allow reproducible test errors. (apache#4175). Updated function.go Start method to make it more clear where we need to provide a port value (apache#4175). Added port and expectedHealthCheckInterval to use of function context. Updated all references. (apache#4175) Added Apache license to gRPC-generated files in attempt to get license check test to pass (apache#4175). Created instanceControlServicer_test.go to test gRPC server and validate that HealthCheck method returns true as expected (apache#4175). Fixed bug in FunctionContext (and context_test.go) where the inputTopics field was being referenced when it wasn't getting populated. Updated GetInputTopics method to get input topics from the source location (apache#4175). Fixed bug in FunctionContext (and context_test.go) where the inputTopics field was being referenced when it wasn't getting populated. Updated GetInputTopics method to get input topics from the source location. (Should have been part of previous commit.) Also, added expectedHealthCheckInterval to conf.yaml for testing. (apache#4175). Fixed license formatting by running mvn license:format (apache#4175). Added logic and tests to allow healthCheck to kill instances that aren't receiving their regular health checks. Still needs an end-to-end test involving FunctionManager to check for possible issues that could kill instances incorrectly (apache#4175). Removed inputTopics field from FunctionContext (apache#4175). Adding the progress I've made so far on migrating the Prometheus code to Go... currently blocked due to missing methods from the Go client. Waiting for information from the Prometheus maintainers to find a workaround. (apache#4175). Fixed license check. (apache#4175) Reverting the last two commits since they should go into a separate PR. (apache#4174). Re-added test file that was accidentially deleted (apache#4175). Added a few comments to make review easier (apache#4175). Made minor (non-functional) changes as per PR review (apache#4175). Fixed print statements (apache#4175). Re-added comment after getting maven license formatting correct (apache#4175). Removed comment that I forgot to remove (apache#4175). Fixed formatting issues for style check (apache#4175). Updated gRPC test to no longer use deprecated method (apache#4175). Fixed more formatting issues by using goimports (apache#4175). Fixed even more formatting issues (apache#4175). Fixed yet even more formatting issues (apache#4175). Added statistics functionality for supporting Prometheus and stats and status commands on Go functions. Needs testing. Also, needs review of specific locations of stats method calls to ensure we're collecting data in the right places. Also, still needs the 1m interval stats to be created. Upstream Prometheus changes prevented us from using the existing approaches for collecting these stats.
devinbost
force-pushed
the
go-features-prometheus
branch
from
February 17, 2020 20:40
8f82e39
to
2acad27
Compare
The pr had no activity for 30 days, mark with Stale label. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Master issue: apache#4175
This PR also depends on the code in apache#6031 .
This PR is to add statistics (to enable calls to get function status, get function stats, and get other metrics) for Pulsar Admin usage and for sending to Prometheus.
Because Prometheus Go did not support some of the ways that we depended on to interact with Prometheus in the Python and Java Pulsar code, this change was more significant and diverges slightly from the approaches used for the other parts of the Pulsar codebase. However, based on feedback from the core maintainers of Prometheus, there is a risk that they may deprecate the other ways that we're using their library. So, the approaches used in this PR should serve as the new standard. For more information about the issues with Prometheus, please see the discussion here: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/prometheus-users/NpkERPC17H4
Motivation
The current implementation of Go functions in Pulsar is missing key statistical information and other information that is critical for determining the health and reliability of Go functions in production. This PR solves that problem.
Does this pull request potentially affect one of the following parts:
It adds functionality that didn't previously exist for Go functions.
It adds functionality that didn't previously exist for Go functions.
It adds functionality that didn't previously exist for Go functions.
Documentation
Prometheus metrics were changed slightly. We needed to replace many of the Prometheus Counters with Gauges due to limitations in the Prometheus Go client.
More documentation will need to be created as Go functions become more production-ready.