Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

race when updating labels and store topology runs concurrently #58405

Open
you06 opened this issue Dec 19, 2024 · 2 comments
Open

race when updating labels and store topology runs concurrently #58405

you06 opened this issue Dec 19, 2024 · 2 comments
Labels
severity/moderate type/bug The issue is confirmed as a bug.

Comments

@you06
Copy link
Contributor

you06 commented Dec 19, 2024

Bug Report

Please answer these questions before submitting your issue. Thanks!

Found by @lcwangchao 's great code review.

1. Minimal reproduce step (Required)

  1. Add a test into pkg/server/handler/tests/http_handler_test.go.
func TestSetLabelsConcurrentWithStoreTopology(t *testing.T) {
	ts := createBasicHTTPHandlerTestSuite()
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	ts.startServer(t)
	defer ts.stopServer(t)

	integration.BeforeTestExternal(t)
	cluster := integration.NewClusterV3(t, &integration.ClusterConfig{Size: 1})
	defer cluster.Terminate(t)
	client := cluster.RandClient()
	infosync.SetEtcdClient(client)

	ts.domain.InfoSyncer().Restart(ctx)
	ts.domain.InfoSyncer().RestartTopology(ctx)

	testUpdateLabels := func() {
		labels := map[string]string{}
		labels["zone"] = fmt.Sprintf("z-%v", rand.Intn(100000))
		buffer := bytes.NewBuffer([]byte{})
		require.Nil(t, json.NewEncoder(buffer).Encode(labels))
		resp, err := ts.PostStatus("/labels", "application/json", buffer)
		require.NoError(t, err)
		require.NotNil(t, resp)
		defer func() {
			require.NoError(t, resp.Body.Close())
		}()
		require.Equal(t, http.StatusOK, resp.StatusCode)
		newLabels := config.GetGlobalConfig().Labels
		require.Equal(t, newLabels, labels)
	}
	testStoreTopology := func() {
		require.NoError(t, ts.domain.InfoSyncer().StoreTopologyInfo(context.Background()))
	}

	done := make(chan struct{})
	go func() {
		for {
			select {
			case <-done:
				return
			default:
				testStoreTopology()
			}
		}
	}()
	for i := 0; i < 100; i++ {
		testUpdateLabels()
	}
	close(done)

	// reset the global variable
	config.UpdateGlobal(func(conf *config.Config) {
		conf.Labels = map[string]string{}
	})
}
  1. Run the test with race detection.
go test ./pkg/server/handler/tests -run TestSetLabelsConcurrentWithStoreTopology -race

2. What did you expect to see? (Required)

Test pass.

3. What did you see instead (Required)

WARNING: DATA RACE
Write at 0x00c00a78ece8 by goroutine 1975:
  github.com/pingcap/tidb/pkg/domain/infosync.UpdateServerLabel()
      github.com/pingcap/tidb/pkg/domain/infosync/info.go:443 +0x33b
  github.com/pingcap/tidb/pkg/server/handler/tikvhandler.LabelHandler.ServeHTTP()
      github.com/pingcap/tidb/pkg/server/handler/tikvhandler/tikv_handler.go:2011 +0x55a
  github.com/pingcap/tidb/pkg/server/handler/tikvhandler.(*LabelHandler).ServeHTTP()
      <autogenerated>:1 +0x58
  github.com/gorilla/mux.(*Router).ServeHTTP()
      github.com/gorilla/mux@v1.8.1/mux.go:212 +0x371
  net/http.(*ServeMux).ServeHTTP()
      net/http/server.go:2747 +0x255
  github.com/pingcap/tidb/pkg/server/internal/util.CorsHandler.ServeHTTP()
      github.com/pingcap/tidb/pkg/server/internal/util/util.go:219 +0x29e
  github.com/pingcap/tidb/pkg/server/internal/util.(*CorsHandler).ServeHTTP()
      <autogenerated>:1 +0x74
  net/http.serverHandler.ServeHTTP()
      net/http/server.go:3210 +0x2a1
  net/http.(*conn).serve()
      net/http/server.go:2092 +0x12a4
  net/http.(*Server).Serve.gowrap3()
      net/http/server.go:3360 +0x4f

Previous read at 0x00c00a78ece8 by goroutine 1928:
  reflect.typedmemmove()
      runtime/mbarrier.go:225 +0x0
  reflect.copyVal()
      reflect/value.go:2048 +0x5b
  reflect.(*MapIter).Value()
      reflect/value.go:1945 +0x124
  encoding/json.mapEncoder.encode()
      encoding/json/encode.go:760 +0x61a
  encoding/json.mapEncoder.encode-fm()
      <autogenerated>:1 +0x84
  encoding/json.structEncoder.encode()
      encoding/json/encode.go:715 +0x2bd
  encoding/json.structEncoder.encode-fm()
      <autogenerated>:1 +0xe4
  encoding/json.(*encodeState).reflectValue()
      encoding/json/encode.go:322 +0x83
  encoding/json.(*encodeState).marshal()
      encoding/json/encode.go:298 +0xea
  encoding/json.Marshal()
      encoding/json/encode.go:164 +0x12b
  github.com/pingcap/tidb/pkg/domain/infosync.(*InfoSyncer).StoreTopologyInfo()
      github.com/pingcap/tidb/pkg/domain/infosync/info.go:724 +0xf5
  github.com/pingcap/tidb/pkg/server/handler/tests.TestSetLabelsConcurrentWithStoreTopology.func2()
      github.com/pingcap/tidb/pkg/server/handler/tests/http_handler_test.go:1564 +0x84
  github.com/pingcap/tidb/pkg/server/handler/tests.TestSetLabelsConcurrentWithStoreTopology.func3()
      github.com/pingcap/tidb/pkg/server/handler/tests/http_handler_test.go:1574 +0x35

4. What is your TiDB version? (Required)

nightly

@you06 you06 added the type/bug The issue is confirmed as a bug. label Dec 19, 2024
@you06
Copy link
Contributor Author

you06 commented Dec 19, 2024

BTW, I'm using SetEtcdClient in test, and there is also a data race in this function(L1333 and L810). We'd better deprecate SetEtcdClient, and replace it with a failpoint injection.

func SetEtcdClient(etcdCli *clientv3.Client) {
is, err := getGlobalInfoSyncer()
if err != nil {
return
}
is.etcdCli = etcdCli
}

// Done returns a channel that closes when the info syncer is no longer being refreshed.
func (is *InfoSyncer) Done() <-chan struct{} {
if is.etcdCli == nil {
return make(chan struct{}, 1)
}
return is.session.Done()
}

@you06
Copy link
Contributor Author

you06 commented Dec 19, 2024

I prefer moving the Labels out of the ServerInfo and keep it immutable and static as the comment states.

// It will not be updated when tidb-server running. So please only put static information in ServerInfo struct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/moderate type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

1 participant