No Slide Title
Transcript
No Slide Title
Università della Svizzera italiana Facoltà di scienze della comunicazione MSc in Communication Sciences 2010-11 Program in Technologies for Human Communication Davide Eynard Software Technology 2 02 - Regular expressions 2 Università della Svizzera italiana Facoltà di scienze della comunicazione I 2 Javascript basics To test regular expressions in javascript, you need to know at least some basic notions of this programming language: Printing information Variable (scalar and arrays) declaration and assignment Conditions (if) Loops (while, for) Objects and method/function calls How can you test your code? Veeery easy examples: using the browser address bar or a bookmarklet More complex ones (with many lines of code): use a development environment such as the one at https://www.squarefree.com/bookmarklets/webdevel.html (search for the “jsenv” bookmarklet) Università della Svizzera italiana Facoltà di scienze della comunicazione I 3 Javascript Regular Expressions Using regular expressions in Javascript usually means performing the following steps: Choose which text you want to parse (the regexp is always applied to a text string!) Define a regular expression to match/extract/substitute text within the chosen string (see previous lesson) Apply the correct methods to perform the desired operation (whether it is matching, extraction, or substitution): Methods connected to the “RegExp” object Methods connected to the “String” object Università della Svizzera italiana Facoltà di scienze della comunicazione I 4 Defining a regular expression/1 To define a regular expression you can simply assign it to a variable: var varName = /PATTERN/[g|i|m]; Examples: var re = /ab+c/; var homerschild = /(Bart|Lisa|Maggie) Simpson/i; var divcontent = /<div>(.*?)<\/div>/gi; ^note the escaping “\”!!! Università della Svizzera italiana Facoltà di scienze della comunicazione I 5 Defining a regular expression/2 Or, you can explicitly define it as an instance of the RegExp object: var varName = new RegExp("PATTERN", "[g|i|m]"); Examples: var re = new RegExp("ab+c"); var homerschild = new RegExp( » "(Bart|Lisa|Maggie) Simpson", "i"); var txt = new RegExp("<div>(.*?)</div>", "gi"); Note that the escaping for “/” is not needed in ^this case... However, escaping is needed if a backslash is already present in the regexp! re = /\w+\s/g; becomes re = new RegExp("\\w+\\s", "g"); Università della Svizzera italiana Facoltà di scienze della comunicazione I And now? 6 Which notation should we use? Implicit (simple) when you know the regexp in advance when you are not interested in performance when you don't know how to deal with objects Explicit (object declaration) when you define the regexp at runtime when you need a faster execution When you know how to deal with objects Università della Svizzera italiana Facoltà di scienze della comunicazione I 7 RegExp: test exec compile String: match search replace split RegExp and String methods Università della Svizzera italiana Facoltà di scienze della comunicazione I 8 RegExp “test” method What does it do? The “test” method just checks if a pattern exists within a string. It returns true if so, and false otherwise Usage: regexp.test(str); Where: regexp is the name of a regular expression variable str is the string against which we want to match the regular expression Example (run it on Google News...): var re=/Grande Fratello/i; var s=document.documentElement.innerHTML; if(re.test(s)){ alert("This is a Big Brother day!"); }else{ alert("No Big Brother today!"); } Università della Svizzera italiana Facoltà di scienze della comunicazione I 9 RegExp “exec” method/1 What does it do? The “exec” method searches for matches inside a given string. If matches are found, they are returned into an array (otherwise the method returns null) Usage: array = regexp.exec(str); Where: regexp is the name of a regular expression str is the string against which to match the regular expression Example (on Facebook friends phone list): var re = new RegExp ("<div class=\"fsl fwb fcb\">.*?<a href=\"[^\"] +\">([^<]+)<.*?<div class=\"fsl\">([^<]+)<span class=\"pls fss fcg\">([^<]+)</span>", "gi"); (NOTE: the previous three lines are actually one!!!) content = document.documentElement.innerHTML; while (array = re.exec(content)){ print(array[1]+";"+array[2]+";"+"\n"); } Università della Svizzera italiana Facoltà di scienze della comunicazione I RegExp “exec” method/2 10 The returned array has a particular format index is the zero-based index of the match in the string input is the original string [0] is the portion of the string that was matched last [1], [2], ..., [n] are the parenthesized substring matches (if they exist) Example: var re = /a(b*)c/; var str = "ccabcabbcbac"; var array = re.exec(str); print(array.index); print(array.input); print(array[0]); print(array[1]); // // // // prints prints prints prints “2” str “abc” “b” Università della Svizzera italiana Facoltà di scienze della comunicazione I RegExp “exec” method/3 11 Given that the exec method returns null if no match is found, it can be used inside a loop to match a regexp many times inside a document Example: var re = /a(b*)c/g; var str = "ccabcabbcbac"; while (array = re.exec(str)){ print(array.index); print(array.input); print(array[0]); print(array[1]); } // note the “g” here // // // // prints prints prints prints “2”,”5”,”10” str “abc”,”abc”,”ac” “b”,”bb”,”” Università della Svizzera italiana Facoltà di scienze della comunicazione I 12 RegExp “compile” method What does it do? The “compile” method converts (compiles) the specified pattern into its internal format. The result is a faster execution Usage: regexp.compile("PATTERN", "[g|i|m]"); Where: regexp is the name of a regular expression PATTERN is the text of the regular expression Example: var re = new RegExp(); re.compile("c*ba", "i"); var str = "abcabcbac"; var array = re.exec(str); print(array); // now matches c*ba Università della Svizzera italiana Facoltà di scienze della comunicazione I String “match” method 13 What does it do? The “match” method is the same as exec, but its object is a string (and requires a regexp as a parameter) NOTE: for global matching and loops, use exec instead: the string match method does not support it Usage: str.match(regexp) like: regexp.exec(str) Where: str is the string against which to match the regular expression regexp is the name of a regular expression Example: var re = /a(b*)c/; var str = "ccabcabcbac"; var array = str.match(re); print(array.index); print(array.input); print(array[0]); print(array[1]); // // // // // only change here prints “2” prints str prints “abc” prints “b” Università della Svizzera italiana Facoltà di scienze della comunicazione I 14 String “search” method What does it do? The “search” method is the same as test, but its object is a string (and requires a regexp as a parameter) Usage: str.search(regexp) like:regexp.test(str) Where: str is the string against which to match the regular expression regexp is the name of a regular expression Università della Svizzera italiana Facoltà di scienze della comunicazione I String “replace” method/1 15 What does it do? The “replace” method Usage: newstr = str.replace(regexp, replaceStr) Where: str is the string against which to match the regular expression regexp is the name of a regular expression replaceStr is a string describing how the substitution has to be made Example: var re = /a(b*)c/; var str = "ccabcabcbac"; var newstr = str.replace(re, "xxx"); print(newstr); // prints "ccxxxabcbac"; Università della Svizzera italiana Facoltà di scienze della comunicazione I 16 String “replace” method/2 NOTE: replaceStr can contain placeholders to use the matched substrings inside it Example: var re = /(\w+)\s(\w+)/g; var str = "Jack Brown; Bob White; Jeff Green"; var newstr = str.replace(re, "$2,$1"); print (newstr); Università della Svizzera italiana Facoltà di scienze della comunicazione I 17 String “split” method What does it do? The “split” method scans a string for delimiters and splits the string into a list of substrings, returning the resulting list in the form of an array Usage: str.split(regexp) Where: str is the string against which to match the regular expression regexp is the name of a regular expression Example: var re = /;/; var str = "Jack Brown; Bob White; Jeff Green"; var array = str.split(re); print (array[0]); // prints “Jack Brown” Università della Svizzera italiana Facoltà di scienze della comunicazione I 18 References Some Web references: http://www.regular-expressions.info/javascript.html https://developer.mozilla.org/en/Core_JavaScript_1.5_Guide/Regular_Expressions http://www.javascriptref.com/examples/ch08-ed2/index.htm Some tools: https://www.squarefree.com/bookmarklets, and in particular the “jsenv” bookmarklet Installation instructions: Connect to https://www.squarefree.com/bookmarklets/webdevel.html Drag the “jsenv” button from the Web page to your bookmarks bar/folder Just click on the link within your bookmarks to open the environment Note: the tool works on the current Web page, so if you want it to run on another page just close it, open the new page, and then click on the bookmarklet again. Università della Svizzera italiana Facoltà di scienze della comunicazione I 19 Exercises The following is a regular expression that we created and tested during the lesson Write a regexp which matches (and is able to extract) the URL and the text connected with an anchor tag Example string to parse: <a href="http://blablabla">Click here</a> RegExp: <a href="([^"]+)">([^<]+)</a> Wrong RegExp: <a href="([^"]+)">(.[^<]+)</a> (note the dot!) The version with the dot matches the example string correctly, however it also matches empty anchors like: <a href="http://blablabla"></a> (add whatever here)</a>