新部署的tidb 使用tiup cluster start..的时候 报错显示 pd-2379.service启动失败 timed out waiting for port 2379 to be started after 2m0s

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】
【概述】新部署的tidb 启动的时候报错
Error: failed to start pd: failed to start: 192.168.38.64 pd-2379.service, please check the instance’s log(/tidb-deploy/pd-2379/log) for more detail.: timed out waiting for port 2379 to be started after 2m0s

【背景】做过哪些操作
啥也没干 刚准备启动
【现象】业务和数据库现象
【业务影响】
【TiDB 版本】
【附件】

topology.yaml的配置信息:

# Global variables are applied to all deployments and used as the default value of

# the deployments if a specific deployment value is missing.

global:
user: “tidb”
ssh_port: 22
deploy_dir: “/tidb-deploy”
data_dir: “/tidb-data”

# Monitored variables are applied to all the machines.

#monitored:

node_exporter_port: 9100

blackbox_exporter_port: 9115

deploy_dir: “/tidb-deploy/monitored-9100”

data_dir: “/tidb-data/monitored-9100”

log_dir: “/tidb-deploy/monitored-9100/log”

# Server configs are used to specify the runtime configuration of TiDB components.

# All configuration items can be found in TiDB docs:

# - TiDB: https://pingcap.com/docs/stable/reference/configuration/tidb-server/configuration-file/

# - TiKV: https://pingcap.com/docs/stable/reference/configuration/tikv-server/configuration-file/

# - PD: https://pingcap.com/docs/stable/reference/configuration/pd-server/configuration-file/

# All configuration items use points to represent the hierarchy, e.g:

# readpool.storage.use-unified-pool

# You can overwrite this configuration via the instance-level config field.

server_configs:

tidb:

log.slow-threshold: 300

binlog.enable: false

binlog.ignore-error: false

tikv:

# server.grpc-concurrency: 4
# raftstore.apply-pool-size: 2
# raftstore.store-pool-size: 2
# rocksdb.max-sub-compactions: 1
# storage.block-cache.capacity: "16GB"
# readpool.unified.max-thread-count: 12

readpool.storage.use-unified-pool: false

readpool.coprocessor.use-unified-pool: true

pd:
replication.enable-placement-rules: true

schedule.leader-schedule-limit: 4

schedule.region-schedule-limit: 2048

schedule.replica-schedule-limit: 64

tidb_servers:

  • host: 192.168.38.63

    ssh_port: 22

    port: 4000

    status_port: 10080

    deploy_dir: “/tidb-deploy/tidb-4000”

    log_dir: “/tidb-deploy/tidb-4000/log”

    numa_node: “0,1”

    # The following configs are used to overwrite the server_configs.tidb values.

    config:

    log.slow-query-file: tidb-slow-overwrited.log

pd_servers:

  • host: 192.168.38.64

    ssh_port: 22

    name: “pd-1”

    client_port: 2379

    peer_port: 2380

    deploy_dir: “/tidb-deploy/pd-2379”

    data_dir: “/tidb-data/pd-2379”

    log_dir: “/tidb-deploy/pd-2379/log”

    numa_node: “0,1”

    # The following configs are used to overwrite the server_configs.pd values.

    config:

    schedule.max-merge-region-size: 20

    schedule.max-merge-region-keys: 200000

tikv_servers:

  • host: 192.168.38.60

    ssh_port: 22

    port: 20160

    status_port: 20180

    deploy_dir: “/tidb-deploy/tikv-20160”

    data_dir: “/tidb-data/tikv-20160”

    log_dir: “/tidb-deploy/tikv-20160/log”

    numa_node: “0,1”

    # The following configs are used to overwrite the server_configs.tikv values.

    config:

    server.grpc-concurrency: 4

    server.labels: { zone: “zone1”, dc: “dc1”, host: “host1” }

  • host: 192.168.38.61
  • host: 192.168.38.62

tiflash_servers:

  • host: 192.168.38.65

monitoring_servers:

  • host: 192.168.38.64

    ssh_port: 22

    port: 9090

    deploy_dir: “/tidb-deploy/prometheus-8249”

    data_dir: “/tidb-data/prometheus-8249”

    log_dir: “/tidb-deploy/prometheus-8249/log”

grafana_servers:

  • host: 192.168.38.64

    port: 3000

    deploy_dir: /tidb-deploy/grafana-3000

alertmanager_servers:

  • host: 192.168.38.64

    ssh_port: 22

    web_port: 9093

    cluster_port: 9094

    deploy_dir: “/tidb-deploy/alertmanager-9093”

    data_dir: “/tidb-data/alertmanager-9093”

    log_dir: “/tidb-deploy/alertmanager-9093/log”

2 个赞

/tidb-deploy/pd-2379/log
这个文件里面没有日志信息

2 个赞

先检查看看防火墙 网络 端口占用

2 个赞

防火墙 全部都关闭了
网络是要检查什么?
这是新虚拟出来的机器 端口没有占用

2 个赞

网络连通性

2 个赞

网络也没啥事问题耶…ping都ping的通

1 个赞

手动在pd主机 systemctl start pd-2379.service试试

1 个赞

1 个赞

大佬们帮忙看一下是什么问题;我这是按照之前的步骤来的 之前部署在开发环境部署都是可以的…不知道这次为什么失败了

1 个赞

大佬还在吗

1 个赞

会不会是ntp服务没起来?

1 个赞

感觉还是网络方面的问题,systemctl start后pd能看到进程吗

1 个赞

看看这个pd的日志,报错的具体原因是什么

1 个赞

pd里面的日志是空的

ntp服务是?

按照@h5n1 说的,启动后,ps下pd-server进程,如果确认进程存在,在其他两台pd主机上 telnet pdhost phport 确认下网络连通性。

我再试一下 看看

如果是网络的问题 我在执行部署的时候是不是就出错了;
tiup cluster deploy tidb-test v5.2.0 ./topology.yaml --user root -p这回命令执行是成功的 就是sftart的时候超时

端口占用会有这种情况,部署时没问题,但启动或停止时就报错了 ,你可以试试换一个端口

我之前遇到过一次,ntp挂了之后,节点启动不了
https://docs.pingcap.com/zh/tidb/stable/check-before-deployment#检测及安装-ntp-服务