Java split String by words example shows how to split string into words in Java. The example also shows how to break string sentences into words using the split method.
How to split String by words?
The simplest way to split the string by words is by the space character as shown in the below example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
package com.javacodeexamples.stringexamples; import java.util.Arrays; public class StringSplitByWords { public static void main(String[] args) { String sentence = "Java String split by words from sentence"; //get words from sentence String[] words = splitSentenceByWords(sentence); //print words System.out.println(Arrays.toString(words)); } private static String[] splitSentenceByWords(String str){ //if string is empty or null, return empty array if(str == null || str.equals("")) return new String[0]; String[] words = str.split(" "); return words; } } |
Output
1 |
[Java, String, split, by, words, from, sentence] |
As you can see from the output, it worked for the test sentence string. The sentence is broken down into words by splitting it using space.
Let’s try some other not-so-simple sentences.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
package com.javacodeexamples.stringexamples; import java.util.Arrays; public class StringSplitByWords { public static void main(String[] args) { String[] sentences = { "string with lot of spaces", "Hello, can I help you?", "Java is a 'programming' language.", "this is user-generated content" }; for (String sentence : sentences){ //get words from sentence String[] words = splitSentenceByWords(sentence); //print words System.out.println(Arrays.toString(words)); } } private static String[] splitSentenceByWords(String str){ //if string is empty or null, return empty array if(str == null || str.equals("")) return new String[0]; String[] words = str.split(" "); return words; } } |
Output
1 2 3 4 |
[string, , with, , , , lot, of, , , spaces] [Hello,, can, I, help, you?] [Java, is, a, 'programming', language.] [this, is, user-generated, content] |
As you can see from the output, our code did not work as expected. The reason being is simple split by space is not enough to separate words from a string. Sentences may be separated by punctuation marks like dot, comma, question marks, etc.
In order to make the code handle all these punctuation and symbols, we will change our regular expression pattern from only space to all the punctuation marks and symbols as given below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
package com.javacodeexamples.stringexamples; import java.util.Arrays; public class StringSplitByWords { public static void main(String[] args) { String[] sentences = { "string with lot of spaces", "Hello, can I help you?", "Java is a 'programming' language.", "this is [user-generated] content" }; for (String sentence : sentences){ //get words from sentence String[] words = splitSentenceByWords(sentence); //print words System.out.println(Arrays.toString(words)); } } private static String[] splitSentenceByWords(String str){ //if string is empty or null, return empty array if(str == null || str.equals("")) return new String[0]; String[] words = str.split("[ !\"\\#$%&'()*+,-./:;<=>?@\\[\\]^_`{|}~]+"); return words; } } |
Output
1 2 3 4 |
[string, with, lot, of, spaces] [Hello, can, I, help, you] [Java, is, a, programming, language] [this, is, user, generated, content] |
This time we got the output as we wanted. The regex pattern [ !\"\\#$%&'()*+,-./:;<=>?@\\[\\]^_`{|}~]+
includes almost all the punctuation and symbols that can be used in a sentence including space. We applied + at the end to match one or more instances of these to make sure that we do not get any empty words.
Instead of this pattern, you can also use \\P{L}
pattern to extract words from the sentence, where \\P
denotes POSIX expression and L
denotes character class for word characters. You need to change the line with the split
method as given below.
1 |
String[] words = str.split("\\P{L}+"); |
Please note that \\P{L}
expression works for both ASCII and non-ASCII characters (i.e. accented characters like “café” or “kākā”).
This example is a part of the Java String tutorial with examples and the Java RegEx tutorial with examples.
Please let me know your views in the comments section below.