Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pulsar-function-go] (WIP) Add statistics and Prometheus to Go Function instances for production readiness #2

Open
wants to merge 134 commits into
base: go-features
Choose a base branch
from

Conversation

devinbost
Copy link
Owner

Master issue: apache#4175
This PR also depends on the code in apache#6031 .

This PR is to add statistics (to enable calls to get function status, get function stats, and get other metrics) for Pulsar Admin usage and for sending to Prometheus.
Because Prometheus Go did not support some of the ways that we depended on to interact with Prometheus in the Python and Java Pulsar code, this change was more significant and diverges slightly from the approaches used for the other parts of the Pulsar codebase. However, based on feedback from the core maintainers of Prometheus, there is a risk that they may deprecate the other ways that we're using their library. So, the approaches used in this PR should serve as the new standard. For more information about the issues with Prometheus, please see the discussion here: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/prometheus-users/NpkERPC17H4

Motivation

The current implementation of Go functions in Pulsar is missing key statistical information and other information that is critical for determining the health and reliability of Go functions in production. This PR solves that problem.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes)
  • The public API: (sort of)
    It adds functionality that didn't previously exist for Go functions.
  • The schema: (no)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (sort of)
    It adds functionality that didn't previously exist for Go functions.
  • The admin cli options: (sort of)
    It adds functionality that didn't previously exist for Go functions.
  • Anything that affects deployment: (possibly, though unlikely, due to new dependencies)

Documentation

  • Does this pull request introduce a new feature? (sort of)
    Prometheus metrics were changed slightly. We needed to replace many of the Prometheus Counters with Gauges due to limitations in the Prometheus Go client.
  • If yes, how is the feature documented? (not documented)
  • If a feature is not applicable for documentation, explain why?
    More documentation will need to be created as Go functions become more production-ready.

sijie and others added 30 commits January 21, 2020 09:04
…ng (apache#6101)

*Motivation*

Related to apache#6084

 apache#5400 introduces `customRuntimeOptions` in function details. But the description was wrong. The mistake was probably introduced by bad merges.

*Modification*

Fix the argument and description for `deadletterTopic` and `customRuntimeOptions`.
### Motivation
Pulsar dashboard is deprecated. We need to remove it from sidebar.

### Modifications
1. Remove Pulsar dashboard from the sidebar (for both master and 2.5.0 release);
2. Add related note and link to Pulsar dashboard in Pulsar manager.
…pache#6063)

### Motivation

Some of the classes in the pulsar-functions module had a mixture of the following lombok annotations:

```
@DaTa
@Setter	
@getter	
@EqualsAndHashCode	
@tostring
```

The [@DaTa](https://projectlombok.org/features/Data) annotation includes all other annotations:

> All together now: A shortcut for @tostring, @EqualsAndHashCode, @getter on all fields, @Setter on all non-final fields, and @requiredargsconstructor!


### Modifications

Removed `@Setter`, `@Getter`, `@EqualsAndHashCode`, and '@tostring' if the `@Data` annotation was also present
…ation on how to work with schemas (apache#6089)

### Motivation

When I read the documentation on how to work with schemas I saw that some code blocks are not highlighted, because of incorrect language identifiers.

### Modifications

Correct language identifiers were applied.
…pache#5915)

* Allow to enable/disable delyed delivery for messages on namespace

Signed-off-by: xiaolong.ran <rxl@apache.org>

* add isDelayedDeliveryEnabled function

Signed-off-by: xiaolong.ran <rxl@apache.org>

* add delayed_delivery_time process logic

Signed-off-by: xiaolong.ran <rxl@apache.org>

* add test case

Signed-off-by: xiaolong.ran <rxl@apache.org>

* update admin cli docs

Signed-off-by: xiaolong.ran <rxl@apache.org>

* fix comments

Signed-off-by: xiaolong.ran <rxl@apache.org>

* fix comments

Signed-off-by: xiaolong.ran <rxl@apache.org>

* fix comments

Signed-off-by: xiaolong.ran <rxl@apache.org>

* update import lib

Signed-off-by: xiaolong.ran <rxl@apache.org>

* avoid import *

Signed-off-by: xiaolong.ran <rxl@apache.org>

* fix comments

Signed-off-by: xiaolong.ran <rxl@apache.org>

* fix comments

Signed-off-by: xiaolong.ran <rxl@apache.org>

* remove unuse code

Signed-off-by: xiaolong.ran <rxl@apache.org>

* fix comments

Signed-off-by: xiaolong.ran <rxl@apache.org>

* add test case for delayed delivery messages

Signed-off-by: xiaolong.ran <rxl@apache.org>

* fix comments

Signed-off-by: xiaolong.ran <rxl@apache.org>

* fix comments

Signed-off-by: xiaolong.ran <rxl@apache.org>
Fixes apache#6114

Modifications
The swagger files were generated for the version 2.5.0 according to the instruction provided by @tuteng
…pache#6095)

Fixes apache#6094 


### Motivation

Incorrect picture in pulsar function-overview

### Modifications

Fix the last word "in" to "pudding".
…ache#6050)

Fixes apache#3681 

### Motivation
The client library doc content is out of date.

### Modifications
1. Pulsar supports more clients, add links to the new clients.
2. Remove redundant information, and will update each client respectively.
3. Update Pulsar client go info.
4. Proofread parts of the Python/Go client content while reading.
5. Fixes apache#3681 by adding examples in Python client.
6. Update Python API.
Fixes apache#5620

Added support for Python native logging within the Pulsar Client Python wrapper.

* added a new Logger implementation that forwards the logging to python
* updated the python module to allow setting the logger on a `pulsar.Client` via the `ClientConfiguration` object
…a key (apache#6065)

Fixes apache#6056 

### Motivation
There is no doc telling users how to choose partition when you use a key.

### Modifications
Add example on how to choose partition when using a key.
I noticed the documentation around batching didn't clarify some important details about how batching works in Pulsar, and particularly how it interacts with acknowledgements and redeliveries. This is my attempt to provide some clarity.
…r configurable. (apache#6028)

Fixes apache#5690

### Motivation

The default type name cannot be compatible with Elasticsearch versions both before 6.2 and after 7.0, so this change makes it configurable.

### Modifications

Make type name configurable, and set the default value to "_doc" to make it compatible with Elasticsearch versions after 6.2 by default.

### Verifying this change

Added unit tests and local integration tests.
* [CPP] Compile fixes for ARM cpus

* Fixed formatting
Fixed:
apache#6115
apache#6110
apache#6108
apache#6100

### Motivation

There are some errors in the content of the current document due to the adjustment of the architecture, so we will fix it.

### Modifications

* The display is inconsistent with the downloaded content.
* Document display error
* Add io-use.md for version 2.5.0
### Motivation

Available permits of ZeroQueueConsuemer must be 1 or less, however ZeroQueueConsuemer using listener may be greater than 1.


### Modifications

If listener is processing message, ZeroQueueConsumer doesn't send permit when it reconnect to broker.


### Reproduction
1. ZeroQueueConsuemer using listener consume a topic.

2. Unload that topic( or restart a broker) when listener is processing message.

3.  ZeroQueueConsumer sends permit when it reconnect to broker.
https://github.com/apache/pulsar/blob/v2.5.0/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ZeroQueueConsumerImpl.java#L133

4. ZeroQueueConsumer also sends permit when finished processing message.
https://github.com/apache/pulsar/blob/v2.5.0/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ZeroQueueConsumerImpl.java#L163

5. Available permits become 2.
* Fix typo in Reader documentation.

Proker -> Broker.

* update copy to be consistent with rest of docs
### Motivation

Current help description of `bin/pulsar tokens` is redundancy when only specified one subcommand.

### Modifications

Print corresponding usage according to the parsed command.
Fix some comment typos in the pulsar-broker module, no real code changes.
… used (apache#5883)

Co-authored-by: Sijie Guo <guosijie@gmail.com>
)

*Motivation*

Fixes apache#5997
Fixes apache#6079

A regression was introduced in apache#5486. If websocket service as running as part of
pulsar standalone, the cluster data is set with null service urls. This causes
service url is not set correctly in the pulsar client and an illegal argument exception
("Param serviceUrl must not be blank.") will be thrown.

*Modifications*

1. Pass `null` when constructing the websocket service. So the local cluster data can
   be refreshed when creating pulsar client.
2. Set the cluster data after both broker service and web service started and ports are allocated.
### Motivation

*A change that had to be moved to 2.5.0 (because it [changed the .proto file](apache#5230 (comment))), is still listed as having been done in 2.4.2 in the release notes.*

### Modifications

*Move the change from 2.4.2 to 2.5.0.*
### Motivation
*Parts of the docs for Pulsar Functions are redundant.*

### Modifications
*Remove parts of the docs for Pulsar Functions that are redundant.*
Anonymitaet and others added 23 commits February 13, 2020 13:29
Fixes apache#5116 

### Motivation
Based on the request in apache#5116, adding this flag can help reduce ambiguity and increase flexibility. By default, the function passes source message properties.

### Modifications
- add flags in pulsar admin client tools
- update functions proto and functionConfig to contain the flag
- update the pulsar sink to pass source message property if the flag is set
…apache#6315)

Fixes #apache#6314

### Motivation

Pulsar Manager do not work if Pulsar authentication is enabled.

### Modifications

pulsar-manager-configmap.yaml was created in order to allow configuration of the enviroment properties in values.yaml
…pache#6112)


Fixes apache#6111 


### Motivation

Add Debezium Mongodb for io-connectors .

### Modifications
…ns to deploy functions (apache#6191)

### Motivation
Fixes feature enhancement request apache#6143:
Currently, there are quite a few undocumented steps that are needed to be performed manually in order to make sure that the functions can be submitted as pods in K8s runtime environment. It would be much better if this process would be automated.

#### Proposed solution:
Automate this process via helm install and update the helm charts with templates.

### Modifications

I've added an additional `functionsAsPods` filed in extra components inside the values file. If the setting is set to `yes`, then it would add `serviceAccount` to the broker deployment. It will also add the rbac policy to give the brokers permissions to deploy functions. The policies can be found in the new `broker-rbac.yaml` template file. Moreover, it will also change the `functions_worker` settings and set the function runtime factory setting that can be found inside `broker-configmap.yaml`.
### Verifying this change
1) Set `functionsAsPods: yes` inside helm values yaml file.
2) Follow the instructions on how deploying helm and run:
`helm install pulsar --values pulsar/values-mini.yaml ./pulsar/`. 
3) Wait until all the services are up and running.
4) Set up tenant, namespace.
5) Create a function, sink and source and submit it using the CLI to make sure the pods are running alongside the Pulsar cluster. In addition, set up such a flow where the data is flowing from source to topics, the processed by a function and sink outputs the data
6) Push data into cluster through the source and make sure it comes out of the sink into destination. There shouldn't be any errors in the logs of brokers, bookie, sources, sinks and functions.

#### Modules affected:
The changes in the PR are affecting the deployment using the helm charts. Now the if the flag `functionsAsPods` is set to `yes` inside the `values.yaml. file, the functions would run as pods.

### Documentation
Currently, the documentations explaining the helm chart deployment process is lacking and this should be updated.
### Motivation

Currently, bundle split splits the bundle into two parts of the same size. When there are fewer topics, bundle split does not work well. The topic assigned to the broker according to the topic name hash value, hashing is not effective in a small number of topics bundle split.

So, this PR introduces an option(-balance-topic-count) for bundle split.  When setting it to true, the given bundle splits to 2 parts, each part has the same amount of topics.

And introduce a new Load Manager implementation named `org.apache.pulsar.broker.loadbalance.impl.BalanceTopicCountModularLoadManager`.  The new Load Manager implementation splits bundle with balance topics count, others are not different from ModularLoadManagerImpl.
Fix the misleading callback examples for send_async after apache#4811
`avroSchema.getSchemaInfo().getSchema()`  in `src/test/java/org/apache/pulsar/client/impl/schema/AvroSchemaTest.java` returns a JSON object. `schemaJson` compares with hard-coded JSON String. However, the order of entries in `schemaJson` is not guaranteed so the json results can contain entries in any order.

This PR proposes to use JSONAssert and modify the corresponding json test assertions so that the test is more stable.

### Motivation

Using JSONAssert and modifying the corresponding json test assertions so that the test is more stable.

### Modifications

Adding `assertJSONEquals` method and replacing `assertEquals` with it in tests `testNotAllowNullSchema` and `testAllowNullSchema`.
*Motivation*

`PULSARBOT_TOKEN` is configured in apache/pulsar to trigger CI runs
…g Pulsar client (apache#6277)

* Attempt at fixing deadlock during client.close()

* Fixed formatting

* Detach the worker thread in the destructor of ExecutorService if it is still unable to be joined

* Possible formatting fixes
…ial duplicated messages and non-duplicated messages into a batch. (apache#6326)

Fixes apache#6273

Motivation
The main reason for apache#6273 is combining potential duplicated messages and non-duplicated messages into a batch. So need to flush the potential duplicated message first and then add the non-duplicated messages to a batch.
Upgrade ZK to latest stable version. In particular we need to include:

- Split brain on log disk full https://issues.apache.org/jira/browse/ZOOKEEPER-3701
- Data loss after upgrading standalone ZK server 3.4.14 to 3.5.6 with snapshot.trust.empty=true https://issues.apache.org/jira/browse/ZOOKEEPER-3644
### Motivation

In pulsar 2.5.0 deploying window functions fails because its class doesn't pass validation.
The behavior looks the same in current master.

### Modifications

Add `WindowFunction.class` to the list of allowed function classes
…pache#6187)

Fixes apache#5904 

### Motivation
Pulsar supports unload a non-partitioned-topic or a partition of a partitioned topic. If there has a partitioned topic with too many partitions, users need to get all partition and unload them one by one. We need to support unload all partition of a partitioned topic.
…ctions for production-readiness (apache#6104)

This PR is to provide integration tests that test execution of Go functions that are managed by the Java FunctionManager. This will allow us to test things like behavior during function timeouts, heartbeat failures, and other situations that can only be effectively tested in an integration test. 

Master issue: apache#4175
Fixes issue: apache#6204 

### Modifications

We must add Go to the integration testing logic. We must also build the Go dependencies into the test Dockerfile to ensure the Go binaries are available at runtime for the integration tests.
…6027)

### Motivation

Fixes apache#5999

### Modifications

Add the logic to handle the blank cluster name.
…er OOM (apache#6178)

Motivation
Introduce maxMessagePublishBufferSizeInMB configuration to avoid broker OOM.

Modifications
If the processing message size exceeds this value, the broker will stop read data from the connection. When available size > half of the maxMessagePublishBufferSizeInMB, start auto-read data from the connection.
…che#6310)

Fixes apache#6045 apache#6281 

### Motivation

Enable get precise backlog and backlog without delayed messages.

### Verifying this change

Added new unit tests for the change.
Fixes apache#5560

### Motivation

Currently, Pulsar SQL can't read the keyValue schema data. This PR added support Pulsar SQL reading messages with a key-value schema.

### Modifications

Add KeyValue schema support for Pulsar SQL. Add prefix __key. for the key field name.
apache#6339)

Motivation

To avoid get partition metadata while the topic name is a partition name.
Currently, if users want to skip all messages for a partitioned topic or unload a partitioned topic, the broker will call get topic metadata many times. For a topic with the partition name, it is not necessary to call get partitioned topic metadata again.
…aml (apache#6340)

Fixes apache#6338

### Motivation
This commit started while I was using helm in my local minikube, noticed that there's a mismatch between `values-mini.yaml` and `values.yaml` files. At first I thought it was a copy/paste error. So I created apache#6338;

Then I looked into the details how these env-vars[ were used](https://github.com/apache/pulsar/blob/28875d5abc4cd13a3e9cc4f59524d2566d9f9f05/conf/bkenv.sh#L36), found out its ok to use `PULSAR_MEM` as an alternative. But it introduce problems:
1. Since `BOOKIE_GC` was not defined , the default [BOOKIE_EXTRA_OPTS](https://github.com/apache/pulsar/blob/28875d5abc4cd13a3e9cc4f59524d2566d9f9f05/conf/bkenv.sh#L39)  will finally use default value of `BOOKIE_GC`, thus would cover same the JVM parameters defined prior in `PULSAR_MEM`.
2. May cause problems when bootstrap scripts changed in later dev, better to make it explicitly.

So I create this pr to solve above problems(hidden trouble).

### Modifications

As mentioned above, I've made such modifications below:
1. make `BOOKIE_MEM` and `BOOKIE_GC` explicit in `values-mini.yaml` file.  Keep up with the format in`values.yaml` file.
2. remove all  print-gc-logs related args. Considering the resource constraints of minikube environment. The removed part's content is `-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -verbosegc -XX:G1LogLevel=finest`
3. leave `PULSAR_PREFIX_dbStorage_rocksDB_blockCacheSize` empty as usual, as [conf/standalone.conf#L576](https://github.com/apache/pulsar/blob/df152109415f2b10dd83e8afe50d9db7ab7cbad5/conf/standalone.conf#L576) says it would to use 10% of the direct memory size by default.
The key shared policy does not support setting the maximum key hash range, so fix the java doc.
…omission of methods for gRPC server registration in generated gRPC files for Go. (apache#4175)

Generated updated gRPC files that contain service registration methods for creating gRPC service in Go. Also, upgraded proto version to 3. (apache#4175)

Fixed build errors by prefixing pulsar-function-go/pb with pb alias. (apache#4175).

Added instanceControlServicer.go as the servicer responsible for serving the gRPC service for the Go Function instances (apache#4175). Rough draft right now.

Added changes to show intent behind passing port value to Start in function.go. Also, added some code to support healthcheck and added methods to support instanceConrolServicer. Just needed to commit changes to allow reproducible test errors. (apache#4175).

Updated function.go Start method to make it more clear where we need to provide a port value (apache#4175).

Added port and expectedHealthCheckInterval to use of function context. Updated all references. (apache#4175)

Added Apache license to gRPC-generated files in attempt to get license check test to pass (apache#4175).

Created instanceControlServicer_test.go to test gRPC server and validate that HealthCheck method returns true as expected (apache#4175).

Fixed bug in FunctionContext (and context_test.go) where the inputTopics field was being referenced when it wasn't getting populated. Updated GetInputTopics method to get input topics from the source location (apache#4175).

Fixed bug in FunctionContext (and context_test.go) where the inputTopics field was being referenced when it wasn't getting populated. Updated GetInputTopics method to get input topics from the source location. (Should have been part of previous commit.) Also, added expectedHealthCheckInterval to conf.yaml for testing. (apache#4175).

Fixed license formatting by running mvn license:format (apache#4175).

Added logic and tests to allow healthCheck to kill instances that aren't receiving their regular health checks. Still needs an end-to-end test involving FunctionManager to check for possible issues that could kill instances incorrectly (apache#4175).

Removed inputTopics field from FunctionContext (apache#4175).

Adding the progress I've made so far on migrating the Prometheus code to Go... currently blocked due to missing methods from the Go client. Waiting for information from the Prometheus maintainers to find a workaround. (apache#4175).

Fixed license check. (apache#4175)

Reverting the last two commits since they should go into a separate PR. (apache#4174).

Re-added test file that was accidentially deleted (apache#4175).

Added a few comments to make review easier (apache#4175).

Made minor (non-functional) changes as per PR review (apache#4175).

Fixed print statements (apache#4175).

Re-added comment after getting maven license formatting correct (apache#4175).

Removed comment that I forgot to remove (apache#4175).

Fixed formatting issues for style check (apache#4175).

Updated gRPC test to no longer use deprecated method (apache#4175).

Fixed more formatting issues by using goimports (apache#4175).

Fixed even more formatting issues (apache#4175).

Fixed yet even more formatting issues (apache#4175).

Added statistics functionality for supporting Prometheus and stats and status commands on Go functions. Needs testing. Also, needs review of specific locations of stats method calls to ensure we're collecting data in the right places. Also, still needs the 1m interval stats to be created. Upstream Prometheus changes prevented us from using the existing approaches for collecting these stats.
@github-actions
Copy link

The pr had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.