Extracting substrings with StringUtils from the Apache library

Following on from some of my earlier posts, where I described some of the useful utils from the apache.commons.lang3 library (like that very nice RandomStringUtils class), this time I will focus a bit on the StringUtils class.  As the name suggests, it provides developers and testers an easy way to deal with certain String related operations. The class is quite large, so in this post I will cover the ‘substring’ category. As the name suggests, these methods deal with extracting substrings from a larger string, based on some conditions.

I find this class very useful when it comes to dealing with Strings, because many times I had to extract substrings that appeared in a string before a certain group of characters, or between some characters, and the classic way of doing that implied too much code and rationale. This is probably what the people who implemented the apache library thought as well, because they came up with these methods that will help you extract substrings with just one line of code.

Let’s take a look at the methods and what scenarios they cover, together with some usage examples:

The setup

Before using the substring methods, if you haven’t already imported the apache library, do so, by adding a dependency in your pom.xml file, to the latest version of the library. For example, add the version 3.4 (to find all the available versions of this library, search for it in the Maven Repository – http://search.maven.org/ or follow this link directly: http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.commons%22%20AND%20a%3A%22commons-lang3%22):

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-lang3</artifactId>
    <version>3.4</version>
</dependency>

Then, in your test, add one of the following imports:
– if you want to import quite a large number of methods, you can use an “import all” strategy, so the following code should be used:

import static org.apache.commons.lang3.StringUtils.*;

– if you want to import a smaller number of methods, you should import them explicitly, like:

import static org.apache.commons.lang3.StringUtils.substring;

Since the import was a static one, the test uses the method call directly (otherwise if the import weren’t static, the test would do something like: StringUtils.methodName(parameters…), which does not look very pretty):

substring("This is my string", 1));

The methods

Here is a description of most of the substring related methods from the StringUtils class, except for two which I did not find very useful. There is a short summary afterwards, more like a light comparison between these methods.

substring(String str, int start)

This method will extract the substring of the initial string (this being the first parameter to pass to the method), starting with the position specified by the second parameter that you need to pass to the method. The character found at the position specified by the second parameter is included in the resulting substring. Take note that the first character of the initial string is found at index 0, so by providing 0 as the position parameter to this method, you will actually get the entire string you wanted to extract from. This method is only interested in where to start the extraction from, and will extract all the text from there to the end of the string. If the position you provided is an integer larger than the size of the initial string, an empty String will be returned instead. Usage examples:

substring("This is my string", 0) --> “This is my string”

substring("This is my string", 1)); --> “his is my string”

substring("This is my string", 100)); --> “” (empty string) 
substring(String str, int start, int end)

This method will receive as first parameter the string from which you want to extract a substring. The second parameter specifies the position from where the extraction will begin, and the character at this position will be included in the resulting string. The third parameter specifies until where the extraction will be done, however the character at that position will not be part of the resulting string. This method will result in an empty string in the following cases: if the parameter for the start position is larger than the parameter for the end position (as the method evaluates the string from left to right) or when the start parameter is larger than the length of the string. Note that if the parameter representing the end position is an integer larger than the length of the string, the resulting substring will include all the characters from the start position on, including the last character of the initial string. Usage examples:

substring("This is my string", 0, 3) --> “Thi”

substring("This is my string", 1, 3) --> “hi”

substring("This is my string", 3, 100) --> “s is my string”

substring("This is my string", 100, 3) --> “” (empty string)

substring("This is my string", 3, 1) --> “” (empty string)   
substringBefore(String str, String separator)

This method will receive as first parameter the string from which you want to extract. The second parameter, named separator, is a string (which means not just one character, but a whole bunch of them), up to where the extraction will occur. So what this method does is that it inspects the string until the first occurrence of the exact string represented by the separator parameter, and returns the string up to, but not including, the position of the first separator string occurrence. The short of it: if the separator is found for the first time at position x, everything from the start of the string to position x-1 will be returned. If the separator string is not found in the initial string, the whole string will be returned instead. Usage examples:

substringBefore("This is my string", " ")); --> “This”

substringBefore("This is my string", "is")); --> “Th”

substringBefore("This is my string", "string") --> “This is my “

substringBefore("This is my string", "not") --> “This is my string”  
substringAfter(String str, String separator)

This method will receive as first parameter the string from which you want to extract. The second parameter, named separator, is a string (which means not just one character, but a whole bunch of them), from where the extraction will begin. Once the first occurrence of the separator is found in the initial string, everything on the right side of the separator’s position, not including the separator, will be returned. The short if it: if the separator is found for the first time at the position x in the string, everything from position x+1 to the end of the string is returned. If the provided separator is not found in the initial string, an empty string is returned instead.Usage examples:

substringAfter("This is my string", " ") --> “is my string”

substringAfter("This is my string", "is") --> “ is my string”

substringAfter("This is my string", "string") --> “” (empty string)

substringAfter("This is my string", "not") --> “” (empty string)
substringBeforeLast(String str, String separator)

This method will receive as first parameter the string from which you want to extract. The second parameter, named separator, is a string (which means not just one character, but a whole bunch of them), up to where the extraction will occur. So what this method does is that it inspects the string until the last occurrence of the exact string represented by the separator parameter, and returns the string up to, but not including, the position of the last separator string occurrence. The short of it: is the separator is found last at position x, everything from the start of the string to position x-1 will be returned. If the separator is not found in the initial string, the whole string will be returned instead.Usage examples:

substringBeforeLast("This is my string", " ") --> “This is my”

substringBeforeLast("This is my string", "is") --> “This “

substringBeforeLast("This is my string", "string") --> “This is my “

substringBeforeLast("This is my string", "not") --> “This is my string”
substringAfterLast(String str, String separator)

This method will receive as first parameter the string from which you want to extract. The second parameter, named separator, is a string (which means not just one character, but a whole bunch of them), from where the extraction will begin. Once the last occurrence of the separator is found in the initial string, everything on the right side of the separator’s position, not including the separator, will be returned. The short if it: if the last occurrence of the separator is at position x in the string, everything from position x+1 to the end of the string is returned. If the provided separator is not found in the initial string, an empty string is returned instead.
Usage examples:

substringAfterLast("This is my string", " ")); --> “string”

substringAfterLast("This is my string", "is")); --> “ my string”

substringAfterLast("This is my string", "string") --> “” (empty string)

substringAfterLast("This is my string", "not")); --> “” (empty string)

substringBetween(String str, String open, String close)

This method will receive as first parameter the string from which you want to extract. The second parameter, named open, is a string (which means not just one character, but a whole bunch of them), from where the extraction will begin, at the first occurrence of this string in the initial string. The third parameter, named close, specifies where the extraction will finish, considering the first occurrence of this string to the right side of the first occurrence of the open string. Basically, once the open string is found, the close string will be searched for on the right side of the open strings’ position, not including the open string. The open and close strings will not be included in the resulting substring. In case the open or close strings cannot be found in the initial string, a null will be returned instead (an actual null, not the String “null”). Usage examples:


substringBetween("This is my string", " ", " ")); --> “is”

substringBetween("This is my string", "is", " ")); --> “” (empty string)

substringBetween("This is my string", "is", "is")); --> “ “

substringBetween("This is my string", " ", "is")); --> “” (empty string)

substringBetween("This is my string", " ", "my")); --> “is “

substringBetween("This is my string", "This", "string")); --> “ is my ”

substringBetween("This is my string", "not", "string")); --> null

To sum it up, here is a simple table:

Position based search Substring –> with 2 parameters (specifying the start position for the extraction) or 3 parameters (specifying both the start and end position for the extraction)
String based search substringBefore –> extraction before the first occurrence of a string

substringAfter –> extraction after the first occurrence of a string

substringBeforeLast –> extraction before the last occurrence of a string

substringAfterLast –> extraction after the last occurrence of a string

substringBetween –> extraction after the first occurrence of a string and before the first occurrence of another string (between the first occurrence of each; however search for the second string in the initial one begins to the right side of the first search string)

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s