sentencepiece/Removed-codes-where-Zero-Width-Joiner-replaced-with-.patch

55 lines
1.6 KiB
Diff
Raw Normal View History

2021-11-02 11:38:21 +08:00
From 82b8b6f61403fcfcef673ee49ed2dfe475ba4cf2 Mon Sep 17 00:00:00 2001
From: Sarubi <stsarut@gmail.com>
Date: Tue, 23 Feb 2021 20:47:25 +0530
Subject: [PATCH] Removed codes where Zero Width Joiner replaced with
whitespace.
---
data/nmt_nfkc.tsv | 3 +--
data/nmt_nfkc_cf.tsv | 3 +--
src/builder.cc | 1 -
3 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/data/nmt_nfkc.tsv b/data/nmt_nfkc.tsv
index 1ce2b71..5c8b48b 100644
--- a/data/nmt_nfkc.tsv
+++ b/data/nmt_nfkc.tsv
@@ -57263,8 +57263,7 @@ FB9 F90 FB5 # ྐྵ => ྐྵ
200A 20 # =>
200B 20 # =>
200C 20 # =>
-200D 20 # =>
-200E 20 # =>
+200E 20 # =>
200F 20 # =>
2011 2010 # =>
2017 20 333 # ‗ => ̳
diff --git a/data/nmt_nfkc_cf.tsv b/data/nmt_nfkc_cf.tsv
index 2178882..0d0e708 100644
--- a/data/nmt_nfkc_cf.tsv
+++ b/data/nmt_nfkc_cf.tsv
@@ -57980,8 +57980,7 @@ FB9 F90 FB5 # ྐྵ => ྐྵ
200A 20 # =>
200B 20 # =>
200C 20 # =>
-200D 20 # =>
-200E 20 # =>
+200E 20 # =>
200F 20 # =>
2011 2010 # =>
2017 20 333 # ‗ => ̳
diff --git a/src/builder.cc b/src/builder.cc
index d9442d3..9f47aac 100644
--- a/src/builder.cc
+++ b/src/builder.cc
@@ -366,7 +366,6 @@ util::Status Builder::BuildNmtNFKCMap(CharsMap *chars_map) {
nfkc_map[{0xFEFF}] = {0x20}; // ZERO WIDTH NO-BREAK
nfkc_map[{0xFFFD}] = {0x20}; // REPLACEMENT CHARACTER
nfkc_map[{0x200C}] = {0x20}; // ZERO WIDTH NON-JOINER
- nfkc_map[{0x200D}] = {0x20}; // ZERO WIDTH JOINER
// Ascii Control characters
nfkc_map[{0x0001}] = {};
--