Clickhouse 字符串函数

2022-10-25 leiting (9524阅读)

标签数据库

1.基本字符串操作

1.1判断是否为空

clickhouse> select empty(''),empty(' '),notEmpty(''),notEmpty(' ');

┌─empty('')─┬─empty(' ')─┬─notEmpty('')─┬─notEmpty(' ')─┐

│ 1 │ 0 │ 0 │ 1 │

└───────────┴────────────┴──────────────┴───────────────┘

1 rows in set. Elapsed: 0.007 sec.

clickhouse的函数名严格区分大小写。

'' 表示空字符串

' ' 表示的一个空格

1.2拼接

concat

clickhouse> select concat('China',' Huawei',' Shenzhen') str;

┌─str───────────────────┐

│ China Huawei Shenzhen │

└───────────────────────┘

clickhouse> select concat('China', null) str;

┌─str──┐

│ ᴺᵁᴸᴸ │

└──────┘

1.3截取：

substring(s, offset, length), mid(s, offset, length), substr(s, offset, length)

三个函数的用法和含义是一样的。

clickhouse> select substring('china',1,2) str;

┌─str─┐

│ ch │

└─────┘

1.4字符串长度

length

char_length, CHAR_LENGTH

character_length, CHARACTER_LENGTH

clickhouse> select length('China'),char_length('China'),character_length('China');

┌─length('China')─┬─char_length('China')─┬─character_length('China')─┐

│ 5 │ 5 │ 5 │

└─────────────────┴──────────────────────┴───────────────────────────┘

clickhouse> select length('中国'),char_length('中国'),character_length('中国');

┌─length('中国')─┬─char_length('中国')─┬─character_length('中国')─┐

│ 6 │ 2 │ 2 │

└────────────────┴─────────────────────┴──────────────────────────┘

1.5字符串重复:

repeat(s, n)

clickhouse> select repeat('China ',5)

┌─repeat('China ', 5)────────────┐

│ China China China China China │

└────────────────────────────────┘

1.6字符串反转：

reverse

1.7追加：

appendTrailingCharIfAbsent(s, c)

1.8获取字符串的开头结尾:

endsWith(s, suffix) startsWith(str, prefix)

1.9 字符串大小写转换:

lower, lcase

upper, ucase

1.10 字符串trim函数:trim

trim([[LEADING|TRAILING|BOTH] trim_character FROM] input_string)

trimLeft(input_string) ==ltrim(input_string)

trimRight(input_string) ==rtrim(input_string)

trimBoth(input_string) ==trim(input_string)

1.11 字符串加密和解密:

base64Encode(s)

base64Decode(s)

tryBase64Decode(s)

CRC32(s)

CRC32IEEE(s)

CRC64(s)

1.12 转换函数：

convertCharset(s, from, to)

1.13其他函数：

concatAssumeInjective

format(pattern, s0, s1, …)

isValidUTF8

toValidUTF8

normalizeQuery

normalizedQueryHash

2.字符串查找

2.1 字符串定位：

position(haystack, needle), locate(haystack, needle),positionCaseInsensitive

语法：

position(haystack, needle[, start_pos])

locate(haystack, needle[, start_pos])

positionCaseInsensitive(haystack, needle[, start_pos])

2.2 多个字符串定位：

multiSearchAllPositions,multiSearchAllPositionsCaseInsensitive

multiSearchAllPositions(haystack, [needle1, needle2, ..., needlen])

multiSearchFirstPosition(haystack, [needle1, needle2, …, needlen])

multiSearchFirstIndex(haystack, [needle1, needle2, …, needlen])

multiSearchAny(haystack, [needle1, needle2, …, needlen])

2.3 字符串匹配：

match(haystack, pattern)

multiMatchAny(haystack, [pattern1, pattern2, …, patternn])

multiMatchAnyIndex(haystack, [pattern1, pattern2, …, patternn])

multiMatchAllIndices(haystack, [pattern1, pattern2, …, patternn])

multiFuzzyMatchAny(haystack, distance, [pattern1, pattern2, …, patternn])

multiFuzzyMatchAnyIndex(haystack, distance, [pattern1, pattern2, …, patternn])

multiFuzzyMatchAllIndices(haystack, distance, [pattern1, pattern2, …, patternn])

2.4 字符串提取：

extract(haystack, pattern)

extractAll(haystack, pattern)

extractAllGroupsHorizontal

extractAllGroupsVertical

注意：extractAllGroupsHorizontal function is slower than extractAllGroupsVertical.

2.5字符串模糊查询：

like(haystack, pattern), haystack LIKE pattern operator

notLike(haystack, pattern), haystack NOT LIKE pattern operator

ilike

ilike(haystack, pattern)

2.6

ngramDistance(haystack, needle)

ngramSearch(haystack, needle)

2.7字符串出现的次数：

countSubstrings(haystack, needle)

countSubstrings(haystack, needle[, start_pos])

类似的函数：

countSubstrings()/countSubstringsCaseInsensitive()/countSubstringsCaseInsensitiveUTF8()

该功能在clickhouse 20.12.+版本实现。

clickhouse> select countSubstrings('I love China,he love china too,she also love china','china') china_count;

3.字符串替换

replaceOne(haystack, pattern, replacement)

replaceAll(haystack, pattern, replacement), replace(haystack, pattern, replacement)

replaceRegexpOne(haystack, pattern, replacement)

replaceRegexpAll(haystack, pattern, replacement)

regexpQuoteMeta(s)

Predefined characters: \0, \\, |, (, ), ^, $, ., [, ], ?, *, +, {, :, -.

4.字符串切割

splitByChar(separator, s)

splitByString(separator, s)

arrayStringConcat(arr[, separator])

alphaTokens(s)

extractAllGroups(text, regexp)

5.UTF8 相关的函数：

Clickhouse> select * from system.functions where name like '%UTF%';

SELECT *

FROM system.functions

WHERE name LIKE '%UTF%'

┌─name────────────────────────────────────────┬─is_aggregate─┬─case_insensitive─┬─alias_to─┐

│ ngramSearchCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ ngramDistanceCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ multiSearchFirstIndexCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ multiSearchAnyUTF8 │ 0 │ 0 │ │

│ positionCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ positionUTF8 │ 0 │ 0 │ │

│ reverseUTF8 │ 0 │ 0 │ │

│ lengthUTF8 │ 0 │ 0 │ │

│ ngramSearchUTF8 │ 0 │ 0 │ │

│ randomStringUTF8 │ 0 │ 0 │ │

│ javaHashUTF16LE │ 0 │ 0 │ │

│ multiSearchFirstIndexUTF8 │ 0 │ 0 │ │

│ isValidUTF8 │ 0 │ 0 │ │

│ multiSearchAllPositionsCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ multiSearchFirstPositionCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ lowerUTF8 │ 0 │ 0 │ │

│ multiSearchAllPositionsUTF8 │ 0 │ 0 │ │

│ multiSearchFirstPositionUTF8 │ 0 │ 0 │ │

│ substringUTF8 │ 0 │ 0 │ │

│ multiSearchAnyCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ toValidUTF8 │ 0 │ 0 │ │

│ ngramDistanceUTF8 │ 0 │ 0 │ │

│ upperUTF8 │ 0 │ 0 │ │

└─────────────────────────────────────────────┴──────────────┴──────────────────┴──────────┘

23 rows in set. Elapsed: 0.016 sec.

参考：

https://clickhouse.tech/docs/en/sql-reference/functions/string-functions/

https://clickhouse.tech/docs/en/sql-reference/functions/string-search-functions/

https://clickhouse.tech/docs/en/sql-reference/functions/string-replace-functions/

https://clickhouse.tech/docs/en/sql-reference/functions/splitting-merging-functions/

————————————————

原文链接：https://blog.csdn.net/vkingnew/article/details/108229090

IT PHP 编程语言开发编程 Linux 科技 Elasticsearch 数据库面试 HTML/CSS/XML 网络 JAVA NoSQL 操作系统 C/C++ Golang Git 算法正则表达式 Redis 互联网 MySql 软件运维 JavaScript 国际架构设计商业 Mac OS TCP/IP Excel Windows Oracle Socket VR Vim MongoDB 运营 Python MemCache 硬件电子娱乐设计摄影 nginx 游戏 WordPress HTTP 团建数码电器 Docker 大模型

公务机包机如何实现全球旅程无缝衔接？ Debian Linux下 vim打开文件，鼠标选中后进入可视模式无法复制粘贴问题的解决李长荣企业如何以可持续治理为基石，推动多元文化与循环经济并进？塑料改性与SBS改性沥青的性能差异科学分析中样品瓶选择对色谱与质谱检测结果的影响及 SureSTART 解决方案 golang：make和new有什么区别？李长荣企业如何以可持续治理为基石，推动多元文化与循环经济并进？

略微加速

略速 - 互联网笔记