From a7112a309af264269510c1bab70df7e57648bfab Mon Sep 17 00:00:00 2001 From: luo rixin Date: Wed, 28 Sep 2022 09:49:52 +0800 Subject: [PATCH] =?UTF-8?q?bluestore:=20=E4=BF=AE=E5=A4=8DOSD=E8=AF=BB?= =?UTF-8?q?=E5=8F=96superblock=E9=94=99=E8=AF=AF=E9=97=AE=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 在aarch64平台64K页大小情况下,osd采用direct write写superblock到地址8k-12k, 采用buffer write写设备标签到地址0-4k,针对buffer write操作系统会按页对齐刷盘, superblock和设备标签刚好在同一个页上,刷盘导致superblock被覆盖,无法读出正确的数据。 将写设备标签改成direct write修复此问题。 Signed-off-by: luo rixin --- ...store-use-direct-write-for-bdevlabel.patch | 43 +++++++++++++++++++ ceph.spec | 6 ++- 2 files changed, 48 insertions(+), 1 deletion(-) create mode 100644 0007-bluestore-use-direct-write-for-bdevlabel.patch diff --git a/0007-bluestore-use-direct-write-for-bdevlabel.patch b/0007-bluestore-use-direct-write-for-bdevlabel.patch new file mode 100644 index 0000000..e032efe --- /dev/null +++ b/0007-bluestore-use-direct-write-for-bdevlabel.patch @@ -0,0 +1,43 @@ +From 7672ceb4f09c81ee7a2d5e8672e2c402c3206b4e Mon Sep 17 00:00:00 2001 +From: luo rixin +Date: Wed, 14 Sep 2022 19:50:01 +0800 +Subject: [PATCH] os/bluestore: use direct write in + BlueStore::_write_bdev_label + +On AArch64 with kernel page size 64K, it occurs occasionally +"OSD::init(): unable to read osd superblock" when deploying osd. +As bluestore use direct write to write the superblock at 0x2000~1000 +and BlueStore::_write_bdev_label use buffer write to write label at +0x0~1000, The OS flush the buffer write algined to page size, it will +overwrite the superblock(0x2000~1000). Use driect write to avoid +overwriting the superblock. + +Fixes: https://tracker.ceph.com/issues/57537 +Signed-off-by: luo rixin +--- + src/os/bluestore/BlueStore.cc | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +diff --git a/src/os/bluestore/BlueStore.cc b/src/os/bluestore/BlueStore.cc +index 8b893be79d1..534fe780f27 100644 +--- a/src/os/bluestore/BlueStore.cc ++++ b/src/os/bluestore/BlueStore.cc +@@ -5104,13 +5104,14 @@ int BlueStore::_write_bdev_label(CephContext *cct, + z.zero(); + bl.append(std::move(z)); + +- int fd = TEMP_FAILURE_RETRY(::open(path.c_str(), O_WRONLY|O_CLOEXEC)); ++ int fd = TEMP_FAILURE_RETRY(::open(path.c_str(), O_WRONLY|O_CLOEXEC|O_DIRECT)); + if (fd < 0) { + fd = -errno; + derr << __func__ << " failed to open " << path << ": " << cpp_strerror(fd) + << dendl; + return fd; + } ++ bl.rebuild_aligned_size_and_memory(BDEV_LABEL_BLOCK_SIZE, BDEV_LABEL_BLOCK_SIZE, IOV_MAX); + int r = bl.write_fd(fd); + if (r < 0) { + derr << __func__ << " failed to write to " << path +-- +2.20.1.windows.1 + diff --git a/ceph.spec b/ceph.spec index 2969bc4..87df6ae 100644 --- a/ceph.spec +++ b/ceph.spec @@ -125,7 +125,7 @@ ################################################################################# Name: ceph Version: 16.2.7 -Release: 5 +Release: 6 %if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler} Epoch: 2 %endif @@ -149,6 +149,7 @@ Patch3: 0003-isa-l-update.patch Patch4: 0004-cmake-add-support-python-3.10.patch Patch5: 0005-ceph-volume-lvm-api-function-no-undefined.patch Patch6: 0006-ceph-volume-decrease-number-of-pvs-calls-in-lvm-list.patch +Patch7: 0007-bluestore-use-direct-write-for-bdevlabel.patch %if 0%{?suse_version} # _insert_obs_source_lines_here ExclusiveArch: x86_64 aarch64 ppc64le s390x @@ -2487,6 +2488,9 @@ exit 0 %config %{_sysconfdir}/prometheus/ceph/ceph_default_alerts.yml %changelog +* Wed Sep 28 2022 luo rixin - 2:16.2.7-6 +- fix osd read superblock error + * Thu Aug 25 2022 yangxiaoliang - 2:16.2.7-5 - fix ceph-volume lvm list calls many times pvs