Let’s start with an example. A new set of requirements comes in for you to develop. As soon as you begin to analyze the requirements, you identify a particular method you need to implement: a method that searches for substrings between two tags in a given string and returns all the matching substrings. Let’s call this method substringsBetween()
, inspired by the Apache Commons Lang library. You are about to test a real-world open source method.
After some thinking, you end up with the following requirements for the substringsBetween()
method:
Searches a string for substrings delimited by a start and end tag, returning all matching substrings in an array.
str
—The string containing the substrings. Null returnsnull
; an empty string returns another empty string.open
—The string identifying the start of the substring. An empty string returns null.close
—The string identifying the end of the substring. An empty string returns null.
The program returns a string array of substrings, or null
if there is no match.
Example: if str
= “axcaycazc”, open
= “a”, and close
= “c”, the output will be an array containing [“x”, “y”, “z”]. This is the case because the “a<something>c” substring appears three times in the original string: the first contains “x” in the middle, the second “y,” and the last “z.”
With these requirements in mind, you write the implementation shown in listing 2.1. You may or may not use TDD to help you develop this feature. You are somewhat confident that the program works. Slightly, but not completely.
Listing 2.1 Implementing the substringsBetween()
method
public static String[] substringsBetween(final String str,
final String open, final String close) {
if (str == null || isEmpty(open) || isEmpty(close)) { ❶
return null;
}
int strLen = str.length();
if (strLen == 0) { ❷
return EMPTY_STRING_ARRAY;
}
int closeLen = close.length();
int openLen = open.length();
List<String> list = new ArrayList<>();
int pos = 0; ❸
while (pos < strLen - closeLen) {
int start = str.indexOf(open, pos); ❹
if (start < 0) { ❺
break;
}
start += openLen;
int end = str.indexOf(close, start); ❻
if (end < 0) { ❼
break;
}
list.add(str.substring(start, end)); ❽
pos = end + closeLen; ❾
}
if (list.isEmpty()) { ❿
return null;
}
return list.toArray(EMPTY_STRING_ARRAY);
}
❶ If the pre-conditions do not hold, returns null right away
❷ If the string is empty, returns an empty array immediately
❸ A pointer that indicates the position of the string we are looking at
❹ Looks for the next occurrence of the open tag
❺ Breaks the loop if the open tag does not appear again in the string
❼ Breaks the loop if the close tag does not appear again in the string
❽ Gets the substring between the open and close tags
❾ Moves the pointer to after the close tag we just found
❿ Returns null if we do not find any substrings
Let’s walk through an example. Consider the inputs str
= “axcaycazc”, open
= “a”, and close
= “c”. None of the three strings are empty, so the method goes straight to the openLen
and closeLen
variables. These two variables store the length of the open
and close
strings, respectively. In this case, both are equal to 1, as “a” and “c” are strings with a single character.
The program then goes into its main loop. This loop runs while there still may be substrings in the string to check. In the first iteration, pos
equals zero (the beginning of the string). We call indexOf
, looking for a possible occurrence of the open
tag. We pass the open
tag and the position to start the search, which at this point is 0. indexOf
returns 0, which means we found an open
tag. (The first element of the string is already the open
tag.)
The program then looks for the end of the substring by calling the indexOf
method again, this time on the close
tag. Note that we increase the start position by the length of the open
tag because we want to look for the close
tag after the end of the entire open
tag. Remember that the open
tag has a length of one but can have any length. If we find a close
tag, this means there is a substring to return to the user. We get this substring by calling the substring
method with the start and end positions as parameters. We then reposition our pos
pointer, and the loop iterates again. Figure 2.1 shows the three iterations of the loop as well as the locations to which the main pointers (start
, end
, and pos
) are pointing.
Figure 2.1 The three iterations of the substringsBetween
method for our example
Now that you have finished the first implementation, you flip your mind to testing mode. It is time for specification and boundary testing. As an exercise, before we work on this problem together, look at the requirements one more time and write down all the test cases you can come up with. The format does not matter—it can be something like “all parameters null.” When you are finished with compare your initial test suite with the one we are about to derive together.
The best way to ensure that this method works properly would be to test all the possible combinations of inputs and outputs. Given that substringsBetween()
receives three string parameters as an input, we would need to pass all possible valid strings to the three parameters, combined in all imaginable ways. Exhaustive testing is rarely possible. We have to be pragmatic.
Leave a Reply