![]() ![]() It’s beautiful! Sound Interesting?Īre you an experienced software engineer and find things like this interesting? Check out Hone’s Career page and shoot us an email! From there, we remove duplicates and re-sort the array in order of noun-frequency, as I mentioned above.Īnd what do we have left? Why, An array of proper nouns, sorted by frequency. We then iterate through this array-running the regex against each individual sentence-and end up creating another new array containing all and only the proper nouns that we’re after. Then, we split this large chunk of text on each new sentence and end up with an array of sentences. Here’s how it all works from soup-to-nuts: First we actually request the URL and extract the text from the response body. ( ) Īrmed with this regex, we had everything we needed to parse proper nouns from any given website. any one of the following characters: ` ' ’ " ^, : - \ *.one or more whitespace characters and either a lowercase letter between a–z or any character that is not a word character. ![]() Finally, this capture group must be followed by one of the two below:.Optionally contains additional characters (except line breaks).Must begin with one or more capital letters.Optionally have a dash after one of those letters.Optionally begin with the letters “i” or “e”.Find one or more whitespace characters (spaces, tabs, and line breaks).Here is what this regular expression does in plain English: The stuff to the left and right of Group 1 ensure that we’re capturing groupings of proper-nouns. This capture group represents the text that we are actually interested in: a noun (“Microsoft”) or nouns (“iPad Air 2”) which-together-form a proper name/noun. ![]() To begin, here is a more illuminating representation of the above regex:Īs you can see, there is one capture group-denoted by Group 1. So, let’s examine this regex in more detail and explore exactly how and why it’s able to extract proper nouns. \s ( *-? .*?)(?: \s | ]) /Īs is often the case, regular expressions don’t lend themselves to immediate readability or comprehension. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |