Module re_tuner

Helper function for working with Regular Expression Erlanb re module.

Copyright © 2021 by Anatolii Kosorukov

Authors: Anatolii Kosorukov (java1cprog@yandex.ru) [web site: rustkas.github.io/].

Description

Helper function for working with Regular Expression Erlanb re module.

Data Types

compile_option()

compile_option() = unicode | anchored | caseless | dollar_endonly | dotall | extended | firstline | multiline | no_auto_capture | dupnames | ungreedy | {newline, nl_spec()} | bsr_anycrlf | bsr_unicode | no_start_optimize | ucp | never_utf

do_action()

do_action() = fun((InputString::string()) -> NewString::string())

mp()

mp() = {re_pattern, term(), term(), term(), term()}

nl_spec()

nl_spec() = cr | crlf | lf | anycrlf | any

Function Index

all_match/2Retrieve Part of the Matched Text.
avoid_characters/0The list of characters which raise an error if escape character is not used.
filter/3Filter Matches in Procedural Code.
first_match/2Retrieve the Matched Text.
first_match_info/2Determine the Position and Length of the Match.
first_part_match/2Retrieve Part of the Matched Text.
is_full_match/2Check whether a string fits a certain pattern in its entirety.
is_match/2Check whether a match can be found for a particular regular expression in a particular string.
match_chain/2Get the matches of one Regex within the matches of another Regex.
match_evaluator/3Replace Matches with Replacements Generated in Code.
mp/1It is reduced form of re:compile/1 function.
mp/2It is reduced form of re:compile/1 function.
replace/1Replace one of shorthand pattern from the list [\s,\w,\h,v] in a pattern string.
replace/3Replace All Matches.
save_pattern/1Make save Regex pattern which make literal for any character.
subfilter/3Filter a match within another match.
submatch_evaluator/4Replace All Matches Within the Matches of Another Regex.
tune/1Replace Regex pattern to more siple one.
unicode_block/1The Unicode character database divides all the code points into blocks.

Function Details

all_match/2

all_match(Text, ReInput) -> Result

Text: subject string

returns: A list

Retrieve Part of the Matched Text. You have a regular expression that matches a substring of the subject text. You want to match just one part of that substring. To isolate the part you want, you added a capturing group to your regular expression.
See also: re:run/3, erlang:hd/1, lists:map/2.

See also: re_tuner:mp/1.

avoid_characters/0

avoid_characters() -> Result

returns: The list of spectial characters.

The list of characters which raise an error if escape character is not used.

filter/3

filter(Text, ReInput, Function) -> Result

Text: subject string
Function: filter function

returns: A list

Filter Matches in Procedural Code. Retrieve a list of all matches a regular expression can find in a string when it is applied repeatedly to the remainder of the string after each match. Get a list of matches that meet certain extra criteria that you cannot (easily) express in a regular expression.
See also: lists:filter/2.

See also: re_tuner:all_match/2, re_tuner:mp/1.

first_match/2

first_match(Text, ReInput) -> Result

Text: subject string

returns: A string as a result

Retrieve the Matched Text. You have a regular expression that matches a part of the subject text, and you want to extract the text that was matched. If the regular expression can match the string more than once, you want only the first match.
See also: re:run/3.

See also: re_tuner:mp/1.

first_match_info/2

first_match_info(Text, ReInput) -> Result

Text: subject string

returns: Tuples as a result

Determine the Position and Length of the Match. Instead of extracting the substring matched by the regular expression you want to determine the starting position and length of the match. With this information, you can extract the match in your own code or apply whatever processing you fancy on the part of the original string matched by the regex.
See also: re:run/3.

See also: re_tuner:mp/1.

first_part_match/2

first_part_match(Text, ReInput) -> Result

Text: subject string

returns: A string

Retrieve Part of the Matched Text. You have a regular expression that matches a substring of the subject text. You want to match just one part of that substring. To isolate the part you want, you added a capturing group to your regular expression.
See also: re:run/3.

See also: re_tuner:mp/1.

is_full_match/2

is_full_match(Text, ReInput) -> Result

Text: subject string

returns: true or false

Check whether a string fits a certain pattern in its entirety. A partial match is not sufficient.
See also: re:run/3.

See also: re_tuner:mp/1.

is_match/2

is_match(Text, ReInput) -> Result

Text: subject string

returns: true or false

Check whether a match can be found for a particular regular expression in a particular string. A partial match is sufficient.
See also: re:run/3.

See also: re_tuner:mp/1.

match_chain/2

match_chain(Text, ReList) -> Result

Text: subject string
ReList: regex list

returns: A list

Get the matches of one Regex within the matches of another Regex. This function takes a list of Regexes. Find the matches of a Regex within the matches of another Regex, within the matches of other Regexes, as many levels deep as you want.
See also: lists:map/2.

See also: re_tuner:all_match/2, re_tuner:mp/1.

match_evaluator/3

match_evaluator(DoAction, Text, Regex) -> Result

DoAction: a spec - function(InputString)-> NewString
Text: subject string
Regex: regex pattern

returns: A string

Replace Matches with Replacements Generated in Code. Replace all matches of a regular expression with a new string that you build up in procedural code. You want to be able to replace each match with a different string, based on the text that was actually matched.
See also: erlang:element/2, string:slice/3, string:length/1, re:run/3.

See also: re_tuner:mp/1.

mp/1

mp(Regex) -> MP | {error, tuple()}

Regex: regex pattern

returns: Opaque data type containing a compiled regular expression

It is reduced form of re:compile/1 function. Return opaque data type containing a compiled regular expression or raise an error badarg.
See also: mp(), re:compile/1.

mp/2

mp(Regex, Options) -> MP | {error, badarg}

Regex: regex pattern
Options: additional regular expression metadata

returns: Opaque data type containing a compiled regular expression

It is reduced form of re:compile/1 function. Return opaque data type containing a compiled regular expression or raise an error badarg.
See also: mp(), re:compile/2.

replace/1

replace(Pattern) -> UpdatedPattern

Pattern: searched regex pattern for replacing

returns: Updated Regex pattern string

Replace one of shorthand pattern from the list [\s,\w,\h,v] in a pattern string.
See also: lists:foldl/3.
Don't apply \w shorthand to unicode content.

replace/3

replace(Text, Regex, Replacement) -> Result

Text: subject string
Regex: regex pattern
Replacement: a replacement string

returns: A string

Replace All Matches. Replace all matches of the regular expression with the replacement text.
See also: re:replace/4.

See also: re_tuner:mp/1.

save_pattern/1

save_pattern(Pattern) -> SavePattern

returns: Save pattern

Make save Regex pattern which make literal for any character.

subfilter/3

subfilter(Text, OuterReInput, InnerReInput) -> Result

Text: subject string

returns: A list

Filter a match within another match. Find all the matches of a particular regular expression, but only within certain sections of the subject string. Another regular expression matches each of the sections in the string.

See also: re_tuner:all_match/2, re_tuner:mp/1.

submatch_evaluator/4

submatch_evaluator(Text, OuterRegex, InnerRegex, Replacement) -> Result

Text: subject string
OuterRegex: outer regex pattern
InnerRegex: inner regex pattern
Replacement: a replacement string

returns: A string

Replace All Matches Within the Matches of Another Regex. Replace all the matches of a particular regular expression, but only within certain sections of the subject string. Another regular expression matches each of the sections in the string.
See also: erlang:element/2, string:slice/3, string:length/1, re:run/3.

See also: re_tuner:mp/1, re_tuner:replace/3.

tune/1

tune(Regex) -> Result

returns: Transformed Regex pattern.

Replace Regex pattern to more siple one.

unicode_block/1

unicode_block(BlockName) -> Range | nomatch

BlockName: is Regular Expression block name

returns: Regular Expressions range of code points

The Unicode character database divides all the code points into blocks. Each block consists of a single range of code points. The code points U+0000 through U+FFFF are divided into 156 blocks in version 6.1 of the Unicode standard.

  ‹U+0000…U+007F \p{InBasicLatin}›
  ‹U+0080…U+00FF \p{InLatin-1Supplement}›
  ‹U+0100…U+017F \p{InLatinExtended-A}›
  ‹U+0180…U+024F \p{InLatinExtended-B}›
  ‹U+0250…U+02AF \p{InIPAExtensions}›
  ‹U+02B0…U+02FF \p{InSpacingModifierLetters}›
  ‹U+0300…U+036F \p{InCombiningDiacriticalMarks}›
  ‹U+0370…U+03FF \p{InGreekandCoptic}›
  ‹U+0400…U+04FF \p{InCyrillic}›
  ‹U+0500…U+052F \p{InCyrillicSupplement}›
  ‹U+0530…U+058F \p{InArmenian}›
  ‹U+0590…U+05FF \p{InHebrew}›
  ‹U+0600…U+06FF \p{InArabic}›
  ‹U+0700…U+074F \p{InSyriac}›
  ‹U+0750…U+077F \p{InArabicSupplement}›
  ‹U+0780…U+07BF \p{InThaana}›
  ‹U+07C0…U+07FF \p{InNKo}›
  ‹U+0800…U+083F \p{InSamaritan}›
  ‹U+0840…U+085F \p{InMandaic}›
  ‹U+08A0…U+08FF \p{InArabicExtended-A}›
  ‹U+0900…U+097F \p{InDevanagari}›
  ‹U+0980…U+09FF \p{InBengali}›
  ‹U+0A00…U+0A7F \p{InGurmukhi}›
  ‹U+0A80…U+0AFF \p{InGujarati}›
  ‹U+0B00…U+0B7F \p{InOriya}›
  ‹U+0B80…U+0BFF \p{InTamil}›
  ‹U+0C00…U+0C7F \p{InTelugu}›
  ‹U+0C80…U+0CFF \p{InKannada}›
  ‹U+0D00…U+0D7F \p{InMalayalam}›
  ‹U+0D80…U+0DFF \p{InSinhala}›
  ‹U+0E00…U+0E7F \p{InThai}›
  ‹U+0E80…U+0EFF \p{InLao}›
  ‹U+0F00…U+0FFF \p{InTibetan}›
  ‹U+1000…U+109F \p{InMyanmar}›
  ‹U+10A0…U+10FF \p{InGeorgian}›
  ‹U+1100…U+11FF \p{InHangulJamo}›
  ‹U+1200…U+137F \p{InEthiopic}›
  ‹U+1380…U+139F \p{InEthiopicSupplement}›
  ‹U+13A0…U+13FF \p{InCherokee}›
  ‹U+1400…U+167F \p{InUnifiedCanadianAboriginalSyllabics}›
  ‹U+1680…U+169F \p{InOgham}›
  ‹U+16A0…U+16FF \p{InRunic}›
  ‹U+1700…U+171F \p{InTagalog}›
  ‹U+1720…U+173F \p{InHanunoo}›
  ‹U+1740…U+175F \p{InBuhid}›
  ‹U+1760…U+177F \p{InTagbanwa}›
  ‹U+1780…U+17FF \p{InKhmer}›
  ‹U+1800…U+18AF \p{InMongolian}›
  ‹U+18B0…U+18FF \p{InUnifiedCanadianAboriginalSyllabicsExtended}›
  ‹U+1900…U+194F \p{InLimbu}›
  ‹U+1950…U+197F \p{InTaiLe}›
  ‹U+1980…U+19DF \p{InNewTaiLue}›
  ‹U+19E0…U+19FF \p{InKhmerSymbols}›
  ‹U+1A00…U+1A1F \p{InBuginese}›
  ‹U+1A20…U+1AAF \p{InTaiTham}›
  ‹U+1B00…U+1B7F \p{InBalinese}›
  ‹U+1B80…U+1BBF \p{InSundanese}›
  ‹U+1BC0…U+1BFF \p{InBatak}›
  ‹U+1C00…U+1C4F \p{InLepcha}›
  ‹U+1C50…U+1C7F \p{InOlChiki}›
  ‹U+1CC0…U+1CCF \p{InSundaneseSupplement}›
  ‹U+1CD0…U+1CFF \p{InVedicExtensions}›
  ‹U+1D00…U+1D7F \p{InPhoneticExtensions}›
  ‹U+1D80…U+1DBF \p{InPhoneticExtensionsSupplement}›
  ‹U+1DC0…U+1DFF \p{InCombiningDiacriticalMarksSupplement}›
  ‹U+1E00…U+1EFF \p{InLatinExtendedAdditional}›
  ‹U+1F00…U+1FFF \p{InGreekExtended}›
  ‹U+2000…U+206F \p{InGeneralPunctuation}›
  ‹U+2070…U+209F \p{InSuperscriptsandSubscripts}›
  ‹U+20A0…U+20CF \p{InCurrencySymbols}›
  ‹U+20D0…U+20FF \p{InCombiningDiacriticalMarksforSymbols}›
  ‹U+2100…U+214F \p{InLetterlikeSymbols}›
  ‹U+2150…U+218F \p{InNumberForms}›
  ‹U+2190…U+21FF \p{InArrows}›
  ‹U+2200…U+22FF \p{InMathematicalOperators}›
  ‹U+2300…U+23FF \p{InMiscellaneousTechnical}›
  ‹U+2400…U+243F \p{InControlPictures}›
  ‹U+2440…U+245F \p{InOpticalCharacterRecognition}›
  ‹U+2460…U+24FF \p{InEnclosedAlphanumerics}›
  ‹U+2500…U+257F \p{InBoxDrawing}›
  ‹U+2580…U+259F \p{InBlockElements}›
  ‹U+25A0…U+25FF \p{InGeometricShapes}›
  ‹U+2600…U+26FF \p{InMiscellaneousSymbols}›
  ‹U+2700…U+27BF \p{InDingbats}›
  ‹U+27C0…U+27EF \p{InMiscellaneousMathematicalSymbols-A}›
  ‹U+27F0…U+27FF \p{InSupplementalArrows-A}›
  ‹U+2800…U+28FF \p{InBraillePatterns}›
  ‹U+2900…U+297F \p{InSupplementalArrows-B}›
  ‹U+2980…U+29FF \p{InMiscellaneousMathematicalSymbols-B}›
  ‹U+2A00…U+2AFF \p{InSupplementalMathematicalOperators}›
  ‹U+2B00…U+2BFF \p{InMiscellaneousSymbolsandArrows}›
  ‹U+2C00…U+2C5F \p{InGlagolitic}›
  ‹U+2C60…U+2C7F \p{InLatinExtended-C}›
  ‹U+2C80…U+2CFF \p{InCoptic}›
  ‹U+2D00…U+2D2F \p{InGeorgianSupplement}›
  ‹U+2D30…U+2D7F \p{InTifinagh}›
  ‹U+2D80…U+2DDF \p{InEthiopicExtended}›
  ‹U+2DE0…U+2DFF \p{InCyrillicExtended-A}›
  ‹U+2E00…U+2E7F \p{InSupplementalPunctuation}›
  ‹U+2E80…U+2EFF \p{InCJKRadicalsSupplement}›
  ‹U+2F00…U+2FDF \p{InKangxiRadicals}›
  ‹U+2FF0…U+2FFF \p{InIdeographicDescriptionCharacters}›
  ‹U+3000…U+303F \p{InCJKSymbolsandPunctuation}›
  ‹U+3040…U+309F \p{InHiragana}›
  ‹U+30A0…U+30FF \p{InKatakana}›
  ‹U+3100…U+312F \p{InBopomofo}›
  ‹U+3130…U+318F \p{InHangulCompatibilityJamo}›
  ‹U+3190…U+319F \p{InKanbun}›
  ‹U+31A0…U+31BF \p{InBopomofoExtended}›
  ‹U+31C0…U+31EF \p{InCJKStrokes}›
  ‹U+31F0…U+31FF \p{InKatakanaPhoneticExtensions}›
  ‹U+3200…U+32FF \p{InEnclosedCJKLettersandMonths}›
  ‹U+3300…U+33FF \p{InCJKCompatibility}›
  ‹U+3400…U+4DBF \p{InCJKUnifiedIdeographsExtensionA}›
  ‹U+4DC0…U+4DFF \p{InYijingHexagramSymbols}›
  ‹U+4E00…U+9FFF \p{InCJKUnifiedIdeographs}›
  ‹U+A000…U+A48F \p{InYiSyllables}›
  ‹U+A490…U+A4CF \p{InYiRadicals}›
  ‹U+A4D0…U+A4FF \p{InLisu}›
  ‹U+A500…U+A63F \p{InVai}›
  ‹U+A640…U+A69F \p{InCyrillicExtended-B}›
  ‹U+A6A0…U+A6FF \p{InBamum}›
  ‹U+A700…U+A71F \p{InModifierToneLetters}›
  ‹U+A720…U+A7FF \p{InLatinExtended-D}›
  ‹U+A800…U+A82F \p{InSylotiNagri}›
  ‹U+A830…U+A83F \p{InCommonIndicNumberForms}›
  ‹U+A840…U+A87F \p{InPhags-pa}›
  ‹U+A880…U+A8DF \p{InSaurashtra}›
  ‹U+A8E0…U+A8FF \p{InDevanagariExtended}›
  ‹U+A900…U+A92F \p{InKayahLi}›
  ‹U+A930…U+A95F \p{InRejang}›
  ‹U+A960…U+A97F \p{InHangulJamoExtended-A}›
  ‹U+A980…U+A9DF \p{InJavanese}›
  ‹U+AA00…U+AA5F \p{InCham}›
  ‹U+AA60…U+AA7F \p{InMyanmarExtended-A}›
  ‹U+AA80…U+AADF \p{InTaiViet}›
  ‹U+AAE0…U+AAFF \p{InMeeteiMayekExtensions}›
  ‹U+AB00…U+AB2F \p{InEthiopicExtended-A}›
  ‹U+ABC0…U+ABFF \p{InMeeteiMayek}›
  ‹U+AC00…U+D7AF \p{InHangulSyllables}›
  ‹U+D7B0…U+D7FF \p{InHangulJamoExtended-B}›
  ‹U+D800…U+DB7F \p{InHighSurrogates}›
  ‹U+DB80…U+DBFF \p{InHighPrivateUseSurrogates}›
  ‹U+DC00…U+DFFF \p{InLowSurrogates}›
  ‹U+E000…U+F8FF \p{InPrivateUseArea}›
  ‹U+F900…U+FAFF \p{InCJKCompatibilityIdeographs}›
  ‹U+FB00…U+FB4F \p{InAlphabeticPresentationForms}›
  ‹U+FB50…U+FDFF \p{InArabicPresentationForms-A}›
  ‹U+FE00…U+FE0F \p{InVariationSelectors}›
  ‹U+FE10…U+FE1F \p{InVerticalForms}›
  ‹U+FE20…U+FE2F \p{InCombiningHalfMarks}›
  ‹U+FE30…U+FE4F \p{InCJKCompatibilityForms}›
  ‹U+FE50…U+FE6F \p{InSmallFormVariants}›
  ‹U+FE70…U+FEFF \p{InArabicPresentationForms-B}›
  ‹U+FF00…U+FFEF \p{InHalfwidthandFullwidthForms}›
  ‹U+FFF0…U+FFFF \p{InSpecials}›

See also: lists:kefind/3.


Generated by EDoc