This page contains everything you need to perform (and understand) word and character counting: A fully functional online word counter, detailed explanation of the code that powers it, and a full dump of the final HTML, CSS, and JavaScript code that the tool uses.
First, here’s the functional word counter tool
The above word counter also features a keyword counter that displays the top 4 keywords in your text. This keyword counter becomes visible only after you have entered some text.
How to build a word counter like this one
In this section, I explain how you can build your own word counter that looks and works like mine above. If you just want to see the full code used for this tool, skip to this section.
Note: All the counting done in this word counter rely heavily on regular expressions (or regex). So, you will need to have some understanding of regex if you’re planning to change the core behavior of this tool. But if you’re just reading this casually, then no solid knowledge of regular expressions is necessary.
Getting user input
We need something to capture user input. To do this, we will be using the HTML textarea element. Like this:
<textarea placeholder="Enter your text here..."/>
The JavaScript to select the above textarea will look like this:
var input = document.querySelectorAll('textarea')[0];
This assumes that our word counter textarea is the first (zero index) textarea on the webpage. If for some reason you have another textarea on your page that appears before the one you’re using for the word counter, adjust the array index accordingly.
We will use input.value to grab the value of the text the user enters into the textarea.
Since we want to automatically calculate word and character counts (and other results) as the user types, we need to execute some code when the keyup event is triggered. For this reason, the bulk of our JavaScript code is contained in a function that looks like this:
input.addEventListener('keyup', function() { // All the logic used for the word counter, sentence counter, character counter, reading time calculator, and keyword finder });
The outputs will be stored in simple HTML div elements that look like this:
<div class="output row"> <div>Characters: <span id="characterCount">0</span> </div> <div>Words: <span id="wordCount">0</span> </div> </div>
Counting words, characters, and sentences
Let’s now explore the regular expressions used for counting words and sentences.
To find words in an input string, we need two things:
- Word boundaries: \b
- Valid word characters: \w
To increase accuracy, we also look for words with hypens (-). We want hyphenated words like “front-end” to be counted as one word instead of two or more.
Here’s our JavaScript regex code for word counting:
var words = input.value.match(/\b[-?(\w+)?]+\b/gi);
- \b matches word boundaries – the starting or ending of a word
- \w+ matches one or more word characters – the + takes care of the “one or more” part
- -? matches hyphens so that words like “front-end” are counted as one word instead of two – using a ? at the end makes it optional
- + at the end of the pattern matches one or more occurrences of the whole pattern
- i makes our regex pattern case insensitive
- g instructs our pattern to do a global search instead of stopping at the first match
Sentences are a little easier to handle because we can just look for sentence separators (or delimiters) and split whenever we find them. JavaScript code for doing this could look like:
var sentences = input.value.split(/[.|!|?]/g);
In the above code, we have a pattern that looks for 3 characters: period (.), exclamation mark (!), and question mark (?)
These are the 3 commonly used sentence separators.
After executing the above line of code, the sentences variable will hold an array containing all the sentences.
But here’s a tricky situation. What if the text we’re counting is something like “talk to you later…”? Because of the three sequential periods, our sentences array will hold 3 strings – one correct sentence, and two empty strings.
To fix this, we modify our sentence calculation code to this instead:
var sentences = input.value.split(/[.|!|?]+/g);
The + at the end of the pattern now helps us properly deal with consecutive sentence delimiters so that “talk to you later…” is now correctly counted as one sentence.
Counting keywords
When you start typing (or paste) text into the above word counter, you will notice that a “top keywords” container automatically appears at the bottom of the tool. This keywords section displays the four top keywords in the entered text and the number of times each keyword occurs.
Keyword counting has various uses. One common use is to prevent yourself from overusing certain keywords in your writing.
Let’s discuss how to calculate it…
Counting keywords: Step 1 – Remove all stop words
Stop words are the most common words in any language. Before doing keyword analysis on our input text, we need to first filter out all the stop words. But since there is no universally accepted list of stop words in English, you will have to do a search by yourself and choose good stop words list you find.
Instead of doing your own search, you can also use my own list of stop words. It is contained in the JavaScript section of the full code (provided below).
To filter out stop words, we will be using this code:
var nonStopWords = []; var stopWords = ["a", "able", "about", "above", "abst", "accordance", "according", ...]; for (var i = 0; i < words.length; i++) { if (stopWords.indexOf(words[i].toLowerCase()) === -1 && isNaN(words[i])) { nonStopWords.push(words[i].toLowerCase()); } }
The stopWords array contains all the the words we want to check against. If a word does not exist in the stopWords array, we add it to the nonStopWords array. If a word exists in the stopWords array, we ignore it. We also ignore all the numbers using the isNaN condition.
Counting keywords: Step 2 – Make an object containing keywords and their count
In this step we create a object called keywords. We then loop through the words in our nonStopWords array checking if the keywords object already contains the current word in the nonStopWords array.
If the word already exists in the keywords object, we increment it’s value by one. If not, we create a new key value pair (the key is the word, and the value is 1).
var keywords = {}; for (var i = 0; i < nonStopWords.length; i++) { if (nonStopWords[i] in keywords) { keywords[nonStopWords[i]] += 1; } else { keywords[nonStopWords[i]] = 1; } }
Counting keywords: Step 3 – Sort the keywords object
In order to use the native sort method in JavaScript on the keywords object, we first convert it to a 2-dimensional array.
var sortedKeywords = []; for (var keyword in keywords) { sortedKeywords.push([keyword, keywords[keyword]]) } sortedKeywords.sort(function(a, b) { return b[1] - a[1] });
We now have a 2D array named sortedKeywords. We use this in the 4th step below.
Counting keywords: Step 4 – Display the top 4 keywords and their count
Now, we display the first four elements of the sortedKeywords array (if there are less than four items, we display whatever number of items the array has). For each item, the word is at position 0, and the count is at position 1.
We create a new HTML list item (li) for each entry and append it to our ul with the ID of topKeywords:
topKeywords.innerHTML = ""; for (var i = 0; i < sortedKeywords.length && i < 4; i++) { var li = document.createElement('li'); li.innerHTML = "<b>" + sortedKeywords[i][0] + "</b>: " + sortedKeywords[i][1]; topKeywords.appendChild(li); }
Here’s a full dump of the code that powers this tool
The HTML code…
<div class="ehi-wordcount-container"> <textarea placeholder="Enter your text here..."/> <div class="output row"> <div>Characters: <span id="characterCount">0</span> </div> <div>Words: <span id="wordCount">0</span> </div> </div> <div class="output row"> <div>Sentences: <span id="sentenceCount">0</span> </div> <div>Paragraphs: <span id="paragraphCount">0</span> </div> </div> <div class="output row"> <div>Reading Time: <span id="readingTime">0</span> </div> </div> <div class="keywords">Top keywords:<ul id="topKeywords"/> </div> </div>
The CSS code…
.ehi-wordcount-container { margin: 2% auto; padding: 15px; background-color: #FFFFFF; -webkit-box-shadow: 0px 1px 4px 0px rgba(0, 0, 0, 0.2); box-shadow: 0px 1px 4px 0px rgba(0, 0, 0, 0.2); } .ehi-wordcount-container textarea { width: 100%; height: 300px; padding: 10px; border: 1px solid #d9d9d9; outline: none; font-size: 1em; resize: none; line-height: 1.5em; } .ehi-wordcount-container textarea:hover { border-color: #C0C0C0; } .ehi-wordcount-container textarea:focus { border-color: #4D90FE; } .ehi-wordcount-container .output.row { width: 100%; border: 1px solid #DDD; font-size: 1.4em; margin: 1% 0; background-color: #F9F9F9; } .ehi-wordcount-container .output.row div { display: inline-block; width: 42%; padding: 10px 15px; margin: 1%; } .ehi-wordcount-container .output.row span { font-weight: bold; } .ehi-wordcount-container .keywords { display: none; font-size: 1.4em; font-weight: 900; } .ehi-wordcount-container .keywords p { margin: 0px; padding: 0px; } .ehi-wordcount-container .keywords ul { font-weight: 400; border: 1px solid #DDD; font-size: 1em; background-color: #F9F9F9; margin: 1% 0; } .ehi-wordcount-container .keywords li { display: inline-block; width: 44%; padding: 10px; margin: 1%; }
Note that the actual styles that apply to the word counter above are also influenced by overall styles used on this website.
So, just using my CSS styles above may not give you exactly the same look and feel. But you should get something close enough.
You can quite easily make additional tweaks to the entire CSS code to make the tool look however you want.
Since CSS controls styles and appearance only, you can technically get a fully functional word counter even if you ignore the above CSS entirely. It just may not look very good.
And now the JavaScript code… This is where the real magic happens.
"use strict"; var input = document.querySelectorAll('textarea')[0], characterCount = document.querySelector('#characterCount'), wordCount = document.querySelector('#wordCount'), sentenceCount = document.querySelector('#sentenceCount'), paragraphCount = document.querySelector('#paragraphCount'), readingTime = document.querySelector('#readingTime'), keywordsDiv = document.querySelectorAll('.keywords')[0], topKeywords = document.querySelector('#topKeywords'); input.addEventListener('keyup', function() { console.clear(); characterCount.innerHTML = input.value.length; var words = input.value.match(/\b[-?(\w+)?]+\b/gi); if (words) { wordCount.innerHTML = words.length; } else { wordCount.innerHTML = 0; } if (words) { var sentences = input.value.split(/[.|!|?]+/g); console.log(sentences); sentenceCount.innerHTML = sentences.length - 1; } else { sentenceCount.innerHTML = 0; } if (words) { var paragraphs = input.value.replace(/\n$/gm, '').split(/\n/); paragraphCount.innerHTML = paragraphs.length; } else { paragraphCount.innerHTML = 0; } if (words) { var seconds = Math.floor(words.length * 60 / 275); if (seconds > 59) { var minutes = Math.floor(seconds / 60); seconds = seconds - minutes * 60; readingTime.innerHTML = minutes + "m " + seconds + "s"; } else { readingTime.innerHTML = seconds + "s"; } } else { readingTime.innerHTML = "0s"; } if (words) { var nonStopWords = []; var stopWords = ["a", "able", "about", "above", "abst", "accordance", "according", "accordingly", "across", "act", "actually", "added", "adj", "affected", "affecting", "affects", "after", "afterwards", "again", "against", "ah", "all", "almost", "alone", "along", "already", "also", "although", "always", "am", "among", "amongst", "an", "and", "announce", "another", "any", "anybody", "anyhow", "anymore", "anyone", "anything", "anyway", "anyways", "anywhere", "apparently", "approximately", "are", "aren", "arent", "arise", "around", "as", "aside", "ask", "asking", "at", "auth", "available", "away", "awfully", "b", "back", "be", "became", "because", "become", "becomes", "becoming", "been", "before", "beforehand", "begin", "beginning", "beginnings", "begins", "behind", "being", "believe", "below", "beside", "besides", "between", "beyond", "biol", "both", "brief", "briefly", "but", "by", "c", "ca", "came", "can", "cannot", "can't", "cause", "causes", "certain", "certainly", "co", "com", "come", "comes", "contain", "containing", "contains", "could", "couldnt", "d", "date", "did", "didn't", "different", "do", "does", "doesn't", "doing", "done", "don't", "down", "downwards", "due", "during", "e", "each", "ed", "edu", "effect", "eg", "eight", "eighty", "either", "else", "elsewhere", "end", "ending", "enough", "especially", "et", "et-al", "etc", "even", "ever", "every", "everybody", "everyone", "everything", "everywhere", "ex", "except", "f", "far", "few", "ff", "fifth", "first", "five", "fix", "followed", "following", "follows", "for", "former", "formerly", "forth", "found", "four", "from", "further", "furthermore", "g", "gave", "get", "gets", "getting", "give", "given", "gives", "giving", "go", "goes", "gone", "got", "gotten", "h", "had", "happens", "hardly", "has", "hasn't", "have", "haven't", "having", "he", "hed", "hence", "her", "here", "hereafter", "hereby", "herein", "heres", "hereupon", "hers", "herself", "hes", "hi", "hid", "him", "himself", "his", "hither", "home", "how", "howbeit", "however", "hundred", "i", "id", "ie", "if", "i'll", "im", "immediate", "immediately", "importance", "important", "in", "inc", "indeed", "index", "information", "instead", "into", "invention", "inward", "is", "isn't", "it", "itd", "it'll", "its", "itself", "i've", "j", "just", "k", "keep", "keeps", "kept", "kg", "km", "know", "known", "knows", "l", "largely", "last", "lately", "later", "latter", "latterly", "least", "less", "lest", "let", "lets", "like", "liked", "likely", "line", "little", "'ll", "look", "looking", "looks", "ltd", "m", "made", "mainly", "make", "makes", "many", "may", "maybe", "me", "mean", "means", "meantime", "meanwhile", "merely", "mg", "might", "million", "miss", "ml", "more", "moreover", "most", "mostly", "mr", "mrs", "much", "mug", "must", "my", "myself", "n", "na", "name", "namely", "nay", "nd", "near", "nearly", "necessarily", "necessary", "need", "needs", "neither", "never", "nevertheless", "new", "next", "nine", "ninety", "no", "nobody", "non", "none", "nonetheless", "noone", "nor", "normally", "nos", "not", "noted", "nothing", "now", "nowhere", "o", "obtain", "obtained", "obviously", "of", "off", "often", "oh", "ok", "okay", "old", "omitted", "on", "once", "one", "ones", "only", "onto", "or", "ord", "other", "others", "otherwise", "ought", "our", "ours", "ourselves", "out", "outside", "over", "overall", "owing", "own", "p", "page", "pages", "part", "particular", "particularly", "past", "per", "perhaps", "placed", "please", "plus", "poorly", "possible", "possibly", "potentially", "pp", "predominantly", "present", "previously", "primarily", "probably", "promptly", "proud", "provides", "put", "q", "que", "quickly", "quite", "qv", "r", "ran", "rather", "rd", "re", "readily", "really", "recent", "recently", "ref", "refs", "regarding", "regardless", "regards", "related", "relatively", "research", "respectively", "resulted", "resulting", "results", "right", "run", "s", "said", "same", "saw", "say", "saying", "says", "sec", "section", "see", "seeing", "seem", "seemed", "seeming", "seems", "seen", "self", "selves", "sent", "seven", "several", "shall", "she", "shed", "she'll", "shes", "should", "shouldn't", "show", "showed", "shown", "showns", "shows", "significant", "significantly", "similar", "similarly", "since", "six", "slightly", "so", "some", "somebody", "somehow", "someone", "somethan", "something", "sometime", "sometimes", "somewhat", "somewhere", "soon", "sorry", "specifically", "specified", "specify", "specifying", "still", "stop", "strongly", "sub", "substantially", "successfully", "such", "sufficiently", "suggest", "sup", "sure", "t", "take", "taken", "taking", "tell", "tends", "th", "than", "thank", "thanks", "thanx", "that", "that'll", "thats", "that've", "the", "their", "theirs", "them", "themselves", "then", "thence", "there", "thereafter", "thereby", "thered", "therefore", "therein", "there'll", "thereof", "therere", "theres", "thereto", "thereupon", "there've", "these", "they", "theyd", "they'll", "theyre", "they've", "think", "this", "those", "thou", "though", "thoughh", "thousand", "throug", "through", "throughout", "thru", "thus", "til", "tip", "to", "together", "too", "took", "toward", "towards", "tried", "tries", "truly", "try", "trying", "ts", "twice", "two", "u", "un", "under", "unfortunately", "unless", "unlike", "unlikely", "until", "unto", "up", "upon", "ups", "us", "use", "used", "useful", "usefully", "usefulness", "uses", "using", "usually", "v", "value", "various", "'ve", "very", "via", "viz", "vol", "vols", "vs", "w", "want", "wants", "was", "wasn't", "way", "we", "wed", "welcome", "we'll", "went", "were", "weren't", "we've", "what", "whatever", "what'll", "whats", "when", "whence", "whenever", "where", "whereafter", "whereas", "whereby", "wherein", "wheres", "whereupon", "wherever", "whether", "which", "while", "whim", "whither", "who", "whod", "whoever", "whole", "who'll", "whom", "whomever", "whos", "whose", "why", "widely", "willing", "wish", "with", "within", "without", "won't", "words", "world", "would", "wouldn't", "www", "x", "y", "yes", "yet", "you", "youd", "you'll", "your", "youre", "yours", "yourself", "yourselves", "you've", "z", "zero"]; for (var i = 0; i < words.length; i++) { if (stopWords.indexOf(words[i].toLowerCase()) === -1 && isNaN(words[i])) { nonStopWords.push(words[i].toLowerCase()); } } var keywords = {}; for (var i = 0; i < nonStopWords.length; i++) { if (nonStopWords[i] in keywords) { keywords[nonStopWords[i]] += 1; } else { keywords[nonStopWords[i]] = 1; } } var sortedKeywords = []; for (var keyword in keywords) { sortedKeywords.push([keyword, keywords[keyword]]) } sortedKeywords.sort(function(a, b) { return b[1] - a[1] }); topKeywords.innerHTML = ""; for (var i = 0; i < sortedKeywords.length && i < 4; i++) { var li = document.createElement('li'); li.innerHTML = "<b>" + sortedKeywords[i][0] + "</b>: " + sortedKeywords[i][1]; topKeywords.appendChild(li); } } if (words) { keywordsDiv.style.display = "block"; } else { keywordsDiv.style.display = "none"; } });
Hope you have enjoyed learning about how a JavaScript word counter works. If you loved this tool, you may also like my popular actual size online ruler and my AJAX-based WordPress password (or plain text) hasher.
Hi Ehi,
I made something similar for a newspaper years ago using AJAX, PHP explode and count on the server side to deal with strings including numbers.
One thing I found was the need to check that there are no instances of two words joined by a comma or full stop and no space, eg ‘test,me’ needs to be transformed to ‘test, me’.
Just my 2c 🙂
Regards
John
Hi John,
Good point! Even while creating this I knew there would be quite a few word combination scenarios and edge cases I may not have considered yet.
Thanks for pointing this out. I will look into modifying my regex to handle that situation.