StringUtils

The most common string processing and modification utilities

package

Default

Methods

Compares two strings and gives the number of character replacements that must be performed to convert one of the strings into the other. A very useful method to use in fuzzy text searches where we want to look for similar texts. This method uses the Levenshtein method for the comparison:

compareByLevenshtein(string $string1, string $string2) : \org\turbocommons\src\main\php\utils\number
static

The Levenshtein distance is defined as the minimal number of characters you have to replace, insert or delete to transform string1 into string2. The complexity of the algorithm is O(m*n), where n and m are the length of string1 and string2.

example

and "aba" will output 1 cause we need to change the h for a b to transform one string into another.

Arguments

$string1

string

The first string to compare

$string2

string

The second string to compare

Response

\org\turbocommons\src\main\php\utils\number

The number of characters to replace to convert $string1 into $string2 where 0 means both strings are the same. The higher the result, the more different the strings are.

Compares the percentage of similarity between two strings, based on the Levenshtein method. A very useful method to use in fuzzy text searches where we want to look for similar texts.

compareSimilarityPercent(string $string1, string $string2) : \org\turbocommons\src\main\php\utils\number
static

Arguments

$string1

string

The first string to compare

$string2

string

The second string to compare

Response

\org\turbocommons\src\main\php\utils\number

A number between 0 and 100, being 100 if both strings are the same and 0 if both strings are totally different

Count the number of capital letters on the given string

countCapitalLetters(string $string) : integer
static

Arguments

$string

string

The string which capital letters will be counted

Response

integer

The number of capital letters that are present on the string

Given a string with a list of elements separated by '/' or '\' that represent some arbitrary path structure, this method will return the number of elements that are listed on the path.

countPathElements(string $path) : \org\turbocommons\src\main\php\utils\number
static
example

-> results in 1 "//folder/folder2/folder3/file.txt" -> results in 4

Arguments

$path

string

A string containing some arbitrary path.

Response

\org\turbocommons\src\main\php\utils\number

The number of elements that are listed on the provided path

Count the number of times a string is found inside another string

countStringOccurences(string $string, string $findMe) : integer
static

Arguments

$string

string

The string where we want to search

$findMe

string

The string that we want to look for

Response

integer

The number of times that $findMe appears on $string

Count the number of words that exist on the given string

countWords(string $string, string $wordSeparator = ' ') : integer
static

Arguments

$string

string

The string which words will be counted

$wordSeparator

string

' ' by default. The character that is considered as the word sepparator

Response

integer

The number of words (elements divided by the wordSeparator value) that are present on the string

Find the string that is most similar to a provided one inside an array of strings.

findMostSimilarString(string $string, array $listOfStrings) : string
static

NOTE: The strings are compared by using the Levenshtein method.

see \org\turbocommons\src\main\php\utils\StringUtils::compareByLevenshtein

Arguments

$string

string

The string that we want to compare with all of the provided array

$listOfStrings

array

An array of strings with all the strings we want to compare

Response

string

The string that was found to be more similar to the provided one

Find the array index that contains the string that is most similar to a provided one inside an array of strings.

findMostSimilarStringIndex(string $string, array $listOfStrings) : integer
static

NOTE: The strings are compared by using the Levenshtein method.

see \org\turbocommons\src\main\php\utils\StringUtils::compareByLevenshtein

Arguments

$string

string

The string that we want to compare with all of the provided array

$listOfStrings

array

An array of strings with all the strings we want to compare

Response

integer

The array index for the string that was found to be more similar to the provided one

Full text search is the official name for the process of searching on a big text content based on a string containing some text to find.

formatForFullTextSearch(string $string, string $wordSeparator = ' ') : string
static

This method will process a text so it removes all the accents and non alphanumerical characters that are not usefull for searching on strings, and convert everything to lower case. To perform the search it is important that both search and searched strings are standarized the same way, to maximize possible matches.

Arguments

$string

string

String to process

$wordSeparator

string

The character that will be used as the word separator. By default it is the empty space character ' '

Response

string

The resulting string

Given a string with a list of elements separated by '/' or '\' that represent some kind of unformatted path, this method will process it to get a standarized one by applying the following rules:

formatPath(string $path, string $separator = '/') : string
static
  • Duplicate separator characters will be removed: "a\\b\c" will become "a/b/c"
  • All separator characters will be unified to the same one: "a\b/c\d" will become "a/b/c/d"
  • No trailing separator will exist: "a\b\c\" will become "a\b\c"

NOTE: This method only applies format to the received string. It does not check if the path is a real location or a valid url, and won't also fail if the received path contains strange characters or is invalid.

Arguments

$path

string

A raw path to be formatted

$separator

string

The character to use as the element divider. Only slash '/' or backslash '\' are allowed.

Response

string

The correctly formatted path without any trailing separator

TODO - copy from Ts

formatUrl() 
static

Generates a random string with the specified length and options

generateRandom(integer $minLength, integer $maxLength, array $charSet = array('0-9', 'a-z', 'A-Z')) : string
static

Arguments

$minLength

integer

Specify the minimum possible length for the generated string

$maxLength

integer

Specify the maximum possible length for the generated string

$charSet

array

Defines the list of possible characters to be generated. Each element of charSet must be a string containing the possible characters like 'a1kjuhAO' or a range like 'a-z', 'A-D', '0-5', ... etc. Note that - character must be escaped - when not specified as part of a range Default charset is alphanumeric ['0-9', 'a-z', 'A-Z']

Response

string

A randomly generated string

TODO

getDomainFromUrl( $string) 
static

Arguments

$string

TODO

getHostNameFromUrl( $string) 
static

Arguments

$string

Generates an array with a list of common words from the specified text.

getKeyWords(string $string, string $max = 25, string $longerThan = 3, string $shorterThan = 15, string $ignoreNumericWords = false) : array
static

The list will be sorted so the words that appear more times on the string are placed first.

Arguments

$string

string

Piece of text that contains the keywords we want to extract

$max

string

The maxium number of keywords that will appear on the result. If set to null, all unique words on the given text will be returned

$longerThan

string

The minimum number of chars for the keywords to find. This is useful to filter some irrelevant words like: the, it, me, ...

$shorterThan

string

The maximum number of chars for the keywords to find

$ignoreNumericWords

string

Tells the method to skip words that represent numeric values on the result. False by default

Response

array

The list of keywords that have been extracted from the given text

Extracts all the lines from the given string and outputs an array with each line as an element.

getLines(string $string, array $filters = array('/\s+/')) : array
static

It does not matter which line separator's been used (windows: \r\n, Linux/Unix: \n, Mac: \r). All source lines will be correctly extracted.

Arguments

$string

string

Text containing one or more lines that will be converted to an array with each line on a different element.

$filters

array

One or more regular expressions that will be used to filter unwanted lines. Lines that match any of the filters will be excluded from the result. By default, all empty lines are ignored (those containing only newline, blank, tabulators, etc..).

Response

array

A list with all the string lines sepparated as different array elements.

Given a string with a list of elements separated by '/' or '\' that represent some arbitrary path structure, this method will format the specified path and remove the number of requested path elements (from its right side) and return the path without that elements.

getPath(string $path, integer $elementsToRemove = 1, string $separator = '/') : string
static

This method can be used with Operating system file paths, urls, or any other string that uses the 'slash separated' format to encode a path.

example

-> results in "/folder/folder2/folder3" if elementsToRemove = 1
"//folder/folder2\folder3\file.txt" -> results in "/folder/folder2" if elementsToRemove = 2

see \org\turbocommons\src\main\php\utils\StringUtils::formatPath

Arguments

$path

string

A string containing some arbitrary path.

$elementsToRemove

integer

(one by default) The number of elements that we want to remove from the right side of the path.

$separator

string

The character to use as the element divider for the returned path. Only slash '/' or backslash '\' are allowed.

Response

string

The received path without the specified number of elements and correctly formatted

Given a string with a list of elements separated by '/' or '\' that represent some arbitrary path structure, this method will return the element that is located at the requested position. If no position is defined, by default the last element of the path will be returned (the most to the right one).

getPathElement(string $path, integer $position = -1) : string
static

This method can be used with Operating system file paths, urls, or any other string that uses the 'slash separated' format to encode a path.

example

-> results in "file.txt" if (-1) position is defined
"//folder/folder2\folder3\file.txt" -> results in "folder" if position 0 is defined
"//folder/folder2\folder3\file.txt" -> results in "folder3" if position 2 is defined
"//folder/folder2\folder3\file.txt" -> results in "folder3" if position -2 is defined
"//folder/folder2\folder3\file.txt" -> results in "folder2" if position -3 is defined

Arguments

$path

string

A string containing some arbitrary path.

$position

integer

The index for the element that we want to extract from the path. Positive values will get path elements starting from the left side, being 0 the first most to the left one. Negative values will get path elements starting from the right side, being -1 the last path element (or the first most to the right one). If not specified, the last one will be returned.

Response

string

The element at the specified path position or the last one if no position is defined

This method works in the same way as getPathElement but it also removes the extension part from the result if it has any.

getPathElementWithoutExt(string $path, integer $position = -1, string $extensionSeparator = '.') : string
static
example

-> results in "file" if position = -1. Notice that ".txt" extension is removed
"//folder/folder2\folder3\file.txt" -> results in "folder3" if position = 2. "folder3" has no extension so it does not get modified.

see \org\turbocommons\src\main\php\utils\StringUtils::getPathElement

Arguments

$path

string

A string containing some arbitrary path.

$position

integer

The index for the element that we want to extract from the path. If not specified, the last one will be returned.

$extensionSeparator

string

The character to be used as the extension separator. The most commonly used is '.'

Response

string

The element at the specified path position with it's extension removed or the last one if no position is defined

This method works in the same way as getPathElement but it only gives the element extension if it has any.

getPathExtension(string $path, integer $position = -1, string $extensionSeparator = '.') : string
static
example

-> results in "txt" if position = -1. Notice that extension without separator character is returned
"//folder/folder2\folder3\file.txt" -> results in "folder3" if position = 2. "folder3" has no extension so it does not get modified.

see \org\turbocommons\src\main\php\utils\StringUtils::getPathElement

Arguments

$path

string

A string containing some arbitrary path.

$position

integer

The index for the element extension that we want to extract from the path. If not specified, the last one will be returned.

$extensionSeparator

string

The character to be used as the extension separator. The most commonly used is '.'

Response

string

The extension from the element at the specified path position or the extension from the last one if no position is defined

TODO - translate from Ts

getSchemeFromUrl() 
static

Test if a given string is written using the camel case format or not.

isCamelCase(string $string, string $type = self::FORMAT_CAMEL_CASE) : boolean
static

3 variants can be checked: Default one that does not care about the first letter case, and Upper or Lower camel case formats which force it to be upper case and lower case respectively.

see \org\turbocommons\src\main\php\utils\StringUtils::FORMAT_CAMEL_CASE \org\turbocommons\src\main\php\utils\StringUtils::FORMAT_UPPER_CAMEL_CASE \org\turbocommons\src\main\php\utils\StringUtils::FORMAT_LOWER_CAMEL_CASE

Arguments

$string

string

The string to be tested

$type

string

The variant of camel case we are testing: StringUtils::FORMAT_UPPER_CAMEL_CASE, StringUtils::FORMAT_LOWER_CAMEL_CASE or StringUtils::FORMAT_CAMEL_CASE (default).

Response

boolean

True if the given string is accepted as camel case for the specified variant.

Tells if a specified string is semantically empty, which applies to any string that is comprised of empty spaces, new line characters, tabulations or any other characters without a visually semantic value to the user.

isEmpty(string $string, array $emptyChars = array()) : boolean
static

Example1: Following strings are considered as empty: " ", "", " \n\n\n", " \t\t\n" Example2: Following strings are not considered as empty: "hello", " a", " \n\nB"

Arguments

$string

string

The text to check

$emptyChars

array

Custom list of strings that will be also considered as empty characters. For example, we can define 'NULL' and '' as empty string values by setting this to ['NULL', '']

Response

boolean

false if the string is not empty, true if the string contains non semantically valuable characters or any other characters defined as "empty" values

Test if a given string is written using the snake case format or not.

isSnakeCase(string $string, string $type = self::FORMAT_SNAKE_CASE) : boolean
static

3 variants can be checked: Default one that does not care about the text case, and Upper or Lower snake case formats which force it to be upper case and lower case respectively.

see \org\turbocommons\src\main\php\utils\StringUtils::FORMAT_SNAKE_CASE \org\turbocommons\src\main\php\utils\StringUtils::FORMAT_UPPER_SNAKE_CASE \org\turbocommons\src\main\php\utils\StringUtils::FORMAT_LOWER_SNAKE_CASE

Arguments

$string

string

The string to be tested

$type

string

The variant of snake case we are testing: StringUtils::FORMAT_UPPER_SNAKE_CASE, StringUtils::FORMAT_LOWER_SNAKE_CASE or StringUtils::FORMAT_SNAKE_CASE (default).

Response

boolean

True if the given string is accepted as snake case for the specified variant.

Tells if the given value is a string or not

isString(mixed $value) : boolean
static

Arguments

$value

mixed

A value to check

Response

boolean

true if the given value is a string, false otherwise

Tells if the given string is a valid url or not

isUrl(mixed $value) : boolean
static

Arguments

$value

mixed

The value to check

Response

boolean

False in case the validation fails or true if validation succeeds.

Method that limits the length of a string and optionally appends informative characters like ' .

limitLen(string $string, integer $limit = 100, string $limiterString = ' ...') : string
static

..' to inform that the original string was longer.

Arguments

$string

string

String to limit

$limit

integer

Max number of characters

$limiterString

string

If the specified text exceeds the specified limit, the value of this parameter will be added to the end of the result. The value is ' ...' by default.

Response

string

The specified string but limited in length if necessary. Final result will never exceed the specified limit, also with the limiterString appended.

Converts all accent characters to ASCII characters on a given string.<br> This method is based on the WordPress implementation called remove_Accents

removeAccents(string $string) : string
static
see https://core.trac.wordpress.org/browser/tags/3.9/src/wp-includes/formatting.php#L682

Arguments

$string

string

Text from which accents must be cleaned

Response

string

The given string with all accent and diacritics replaced by the respective ASCII characters.

Remove all html code and tags from the specified text, so it gets converted to plain text.

removeHtmlCode(string $string, string $allowedTags = '') : string
static

Arguments

$string

string

The string to process

$allowedTags

string

You can use this optional second parameter to specify tags which should not be stripped. Example: '<p><a><b><li><br><u>' To preserve the specified tags

Response

string

The string without the html code

Deletes all new line characters from the given string

removeNewLineCharacters(string $string) : string
static

Arguments

$string

string

The string to process

Response

string

The string without any new line character

Remove all duplicate consecutive fragments from the provided string

removeSameConsecutive(string $string, array $set = array()) : string
static
example

we want to remove all duplicate consecutive empty spaces, we will call removeSameConsecutive('string', [' '])

we want to remove all duplicate consecutive new line characters, we will call removeSameConsecutive("string\n\n\nstring", ["\n"])

we want to remove all duplicate "hello" words, we will call removeSameConsecutive('hellohellohellohello', ['hello'])

Arguments

$string

string

The string to process

$set

array

A list with the fragments that will be removed when found consecutive. If this value is an empty array, all duplicate consecutive characters will be deleted. We can pass here words or special characters like "\n"

Response

string

The string with a maximum of one consecutive sequence for all those matching the provided set

Remove all urls from The string to process

removeUrls(string $string, string $replacement = 'xxxx') : string
static

Arguments

$string

string

The string to process

$replacement

string

The replacement string that will be shown when some url is removed

Response

string

The string without the urls

Deletes from a string all the words that are longer than the specified length

removeWordsLongerThan(string $string, integer $longerThan = 3, string $wordSeparator = ' ') : string
static

Arguments

$string

string

The string to process

$longerThan

integer

The maximum length for the words to be preserved. Any word that exceeds the specified length will be removed from the string.

$wordSeparator

string

The character that will be used as the word separator. By default it is the empty space character ' '

Response

string

The string without the removed words

Deletes from a string all the words that are shorter than the specified length

removeWordsShorterThan(string $string, integer $shorterThan = 3, string $wordSeparator = ' ') : string
static

Arguments

$string

string

The string to process

$shorterThan

integer

The minimum length for the words to be preserved. So any word that is shorther than the specified value will be removed.

$wordSeparator

string

The character that will be used as the word separator. By default it is the empty space character ' '

Response

string

The string without the removed words

TODO docs TODO Verify that this version works exactly the same as the TS one, and implement the same unit tests

replace( $string,  $search,  $replacement,  $count = -1) 
static

Arguments

$string

$search

$replacement

$count

TODO translate from TS

trim() 
static

Remove whitespaces (or any custom set of characters) from a string left side

trimLeft(string $string, string $characters = " \n\r") : string
static
example

StringUtils::trimLeft("abcXXabc", "abc") outputs "XXabc"

Arguments

$string

string

A string to process

$characters

string

A set of characters that will be trimmed from string left side. By default, empty space and new line characters are defined : " \n\r"

Response

string

The trimmed string

TODO translate from TS

trimRight() 
static

Constants

Defines the sentence case format (Only the first character of the sentence is capitalised,except for proper nouns and other words which are required by a more specific rule to be capitalised).

FORMAT_SENTENCE_CASE

Generally equivalent to the baseline universal standard of formal English orthography

Defines the start case format (The first character in all words capitalised and all the rest of the word lower case). It is also called Title Case

FORMAT_START_CASE

Defines the all upper case format (All letters on a string written with Capital letters only)

FORMAT_ALL_UPPER_CASE

Defines the all lower case format (All letters on a string written with lower case letters only)

FORMAT_ALL_LOWER_CASE

Defines the first upper rest lower case format (All letters on a string written with lower case letters except the first one which is Capitalized)

FORMAT_FIRST_UPPER_REST_LOWER

Defines the CamelCase format (the practice of writing compound words or phrases such that each word or abbreviation begins with a capital letter)

FORMAT_CAMEL_CASE

Defines the UpperCamelCase format variation that writes first letter as upper case

FORMAT_UPPER_CAMEL_CASE
see

Defines the lowerCamelCase format variation that writes first letter as lower case

FORMAT_LOWER_CAMEL_CASE
see

Defines the snake_case format (the practice of writing compound words or phrases in which the elements are separated with one underscore character (_) and no spaces)

FORMAT_SNAKE_CASE

Defines the FORMAT_UPPER_SNAKE_CASE format variation that writes all letters as upper case

FORMAT_UPPER_SNAKE_CASE
see

Defines the lower_snake_case format variation that writes all letters as lower case

FORMAT_LOWER_SNAKE_CASE
see