luni, 21 martie 2016

Java String parsing: Pattern and Matcher

1. Requirements:

  - given a string formed of letters and digits, split the string into two arrays
    one array of numbers and one of words
  - sort the array of words alphabetically case insensitive and display them at console output
  - sum the array of numbers, display the array and sum
  - Example:
      - dog7break89Wind678solace2Dream23
      - sorted array of words: break, dog, Dream, solace, Wind
      - array of numbers: 7, 89, 678, 2, 23
        sum: 799

2. Regex description

  - to extract the desired components of the string, regular expressions were used.
  - to extract the natural numbers \\d+ was used
    \d stands for decimal and the extra \ is and escaping character
    + is one or more times the pattern
    so \\d+ will be one or more characters from the 0-9 group
  - to extract the words [a-zA-Z]+ was used
    the square brackets represent a character class and match one of the characters inside
    in our case it matches one of the letters from the alphabet
    by matching one or more times the pattern with + we can get a word
    in the current example a word is considered to be any iteration of letters
/** Regex for the natural number matcher. */
private static final String NUMBER_MATCHER_REGEX = "\\d+";
/** Regex for the word matcher. */
private static final String WORD_MATCHER_REGEX = "[a-zA-Z]+";

3. Pattern and Matcher

  - the pattern is defined with the help of the Pattern class
    the static method compile will take the regex expression parameter and return the Pattern instance
    using the instance you can retrieve the Matcher object, which will be used to retrieve the string
  - the Matcher object uses the function find to get the subsequence that matches the pattern
    if the function returns true, it means that a result was found and it can be retrieved with the
     function group
    by using a while loop you can retrieve all the matching subsequnces
    private ArrayList getWordList() {
        final ArrayList list = new ArrayList<>();
        final Matcher matcher = getWordMatcher();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        return list;
    }

    private ArrayList getWordList() {
        final ArrayList list = new ArrayList<>();
        final Matcher matcher = getWordMatcher();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        return list;
    }

4. Adding results to ArrayList

  - the strings resulted from the number matcher need to be parsed to Integers
    the created list can be initialized with Integers by adding a Integer.valueOf on the values provided
      by matcher.group inside the matcher.find loop
  - the strings resulted from the word matcher don't need parsing, but need sorting
    the sorting can be done with the Collections.sort function
    public ArrayList getIntegerList() {
        ArrayList list = new ArrayList<>();
        Matcher matcher = getNumberMatcher();
        while (matcher.find()) {
            list.add(getInteger(matcher));
        }
        return list;
    }

    public ArrayList getSortedWordList() {
        final ArrayList list = getWordList();
        Collections.sort(list, String.CASE_INSENSITIVE_ORDER);
        return list;
    }

5. Making the sum

  - the sum can be obtained by using a for to iterate each value and add it to the result
    this also can be done with streams, by retrieving the stream from the ArrayList
    the function mapToInt is used on the resulted stream to get the IntStream and finally the sum
      function is used to get the result
    public int getSum() {
        return getIntegerList().stream().mapToInt(e -> e).sum();
    }

6. Source code:

  - the source code is contained in the following zip file.
  - the source should be easy to import into Eclipse.
  - also the zip file is hosted on google drive.

  Any feedback is appreciated.