No Slide Title

Transcript

No Slide Title
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
MSc in Communication Sciences 2010-11
Program in Technologies for Human Communication
Davide Eynard
Software Technology 2
02 - Regular expressions 2
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
2
Javascript basics
 To test regular expressions in javascript, you need to know at
least some basic notions of this programming language:
 Printing information
 Variable (scalar and arrays) declaration and assignment
 Conditions (if)
 Loops (while, for)
 Objects and method/function calls
 How can you test your code?
 Veeery easy examples: using the browser address bar or a
bookmarklet
 More complex ones (with many lines of code): use a
development environment such as the one at
https://www.squarefree.com/bookmarklets/webdevel.html
(search for the “jsenv” bookmarklet)
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
3
Javascript Regular Expressions
Using regular expressions in Javascript usually means
performing the following steps:
 Choose which text you want to parse (the regexp is always
applied to a text string!)
 Define a regular expression to match/extract/substitute
text within the chosen string (see previous lesson)
 Apply the correct methods to perform the desired
operation (whether it is matching, extraction, or
substitution):
 Methods connected to the “RegExp” object
 Methods connected to the “String” object
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
4
Defining a regular expression/1
 To define a regular expression you can simply assign it to a
variable:
 var varName = /PATTERN/[g|i|m];
 Examples:
 var re = /ab+c/;
 var homerschild = /(Bart|Lisa|Maggie) Simpson/i;
 var divcontent = /<div>(.*?)<\/div>/gi;
^note the escaping “\”!!!
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
5
Defining a regular expression/2
 Or, you can explicitly define it as an instance of the RegExp
object:
 var varName = new RegExp("PATTERN", "[g|i|m]");
 Examples:
 var re = new RegExp("ab+c");
 var homerschild = new RegExp(
» "(Bart|Lisa|Maggie) Simpson", "i");
 var txt = new RegExp("<div>(.*?)</div>", "gi");
 Note that the escaping for “/” is not needed in ^this case...
However, escaping is needed if a backslash is already present
in the regexp!
 re = /\w+\s/g;
becomes
 re = new RegExp("\\w+\\s", "g");
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
And now?
6
Which notation should we use?
 Implicit (simple)
 when you know the
regexp in advance
 when you are not
interested in
performance
 when you don't know
how to deal with objects
 Explicit (object declaration)
 when you define the
regexp at runtime
 when you need a faster
execution
 When you know how to
deal with objects
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
7
 RegExp:
 test
 exec
 compile
 String:
 match
 search
 replace
 split
RegExp and String methods
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
8
RegExp “test” method
 What does it do?
 The “test” method just checks if a pattern exists within a
string. It returns true if so, and false otherwise
 Usage:
 regexp.test(str);
 Where:
 regexp is the name of a regular expression variable
 str is the string against which we want to match the
regular expression
 Example (run it on Google News...):
var re=/Grande Fratello/i;
var s=document.documentElement.innerHTML;
if(re.test(s)){
alert("This is a Big Brother day!");
}else{
alert("No Big Brother today!");
}
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
9
RegExp “exec” method/1
 What does it do?
 The “exec” method searches for matches inside a given
string. If matches are found, they are returned into an
array (otherwise the method returns null)
 Usage: array = regexp.exec(str);
 Where:
 regexp is the name of a regular expression
 str is the string against which to match the regular
expression
 Example (on Facebook friends phone list):
var re = new RegExp ("<div class=\"fsl fwb fcb\">.*?<a href=\"[^\"]
+\">([^<]+)<.*?<div class=\"fsl\">([^<]+)<span class=\"pls fss
fcg\">([^<]+)</span>", "gi");
(NOTE: the previous three lines are actually one!!!)
content = document.documentElement.innerHTML;
while (array = re.exec(content)){
print(array[1]+";"+array[2]+";"+"\n");
}
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
RegExp “exec” method/2
10
 The returned array has a particular format
 index is the zero-based index of the match in the string
 input is the original string
 [0] is the portion of the string that was matched last
 [1], [2], ..., [n] are the parenthesized substring matches
(if they exist)
 Example:
var re = /a(b*)c/;
var str = "ccabcabbcbac";
var array = re.exec(str);
print(array.index);
print(array.input);
print(array[0]);
print(array[1]);
//
//
//
//
prints
prints
prints
prints
“2”
str
“abc”
“b”
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
RegExp “exec” method/3
11
 Given that the exec method returns null if no match is
found, it can be used inside a loop to match a regexp many
times inside a document
 Example:
var re = /a(b*)c/g;
var str = "ccabcabbcbac";
while (array = re.exec(str)){
print(array.index);
print(array.input);
print(array[0]);
print(array[1]);
}
// note the “g” here
//
//
//
//
prints
prints
prints
prints
“2”,”5”,”10”
str
“abc”,”abc”,”ac”
“b”,”bb”,””
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
12
RegExp “compile” method
 What does it do?
 The “compile” method converts (compiles) the specified
pattern into its internal format. The result is a faster
execution
 Usage:
 regexp.compile("PATTERN", "[g|i|m]");
 Where:
 regexp is the name of a regular expression
 PATTERN is the text of the regular expression
 Example:
var re = new RegExp();
re.compile("c*ba", "i");
var str = "abcabcbac";
var array = re.exec(str);
print(array);
// now matches c*ba
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
String “match” method
13
 What does it do?
 The “match” method is the same as exec, but its object is
a string (and requires a regexp as a parameter)
 NOTE: for global matching and loops, use exec instead:
the string match method does not support it
 Usage:
 str.match(regexp)
like: regexp.exec(str)
 Where:
 str is the string against which to match the regular
expression
 regexp is the name of a regular expression
 Example:
var re = /a(b*)c/;
var str = "ccabcabcbac";
var array = str.match(re);
print(array.index);
print(array.input);
print(array[0]);
print(array[1]);
//
//
//
//
//
only change here
prints “2”
prints str
prints “abc”
prints “b”
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
14
String “search” method
 What does it do?
 The “search” method is the same as test, but its object is
a string (and requires a regexp as a parameter)
 Usage:
 str.search(regexp)
like:regexp.test(str)
 Where:
 str is the string against which to match the regular
expression
 regexp is the name of a regular expression
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
String “replace” method/1
15
 What does it do?
 The “replace” method
 Usage:
 newstr = str.replace(regexp, replaceStr)
 Where:
 str is the string against which to match the regular
expression
 regexp is the name of a regular expression
 replaceStr is a string describing how the substitution has
to be made
 Example:
var re = /a(b*)c/;
var str = "ccabcabcbac";
var newstr = str.replace(re, "xxx");
print(newstr); // prints "ccxxxabcbac";
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
16
String “replace” method/2
 NOTE: replaceStr can contain placeholders to use the matched
substrings inside it
 Example:
var re = /(\w+)\s(\w+)/g;
var str = "Jack Brown; Bob White; Jeff Green";
var newstr = str.replace(re, "$2,$1");
print (newstr);
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
17
String “split” method
 What does it do?
 The “split” method scans a string for delimiters and splits
the string into a list of substrings, returning the resulting
list in the form of an array
 Usage:
 str.split(regexp)
 Where:
 str is the string against which to match the regular
expression
 regexp is the name of a regular expression
 Example:
var re = /;/;
var str = "Jack Brown; Bob White; Jeff Green";
var array = str.split(re);
print (array[0]);
// prints “Jack Brown”
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
18
References
 Some Web references:

http://www.regular-expressions.info/javascript.html

https://developer.mozilla.org/en/Core_JavaScript_1.5_Guide/Regular_Expressions

http://www.javascriptref.com/examples/ch08-ed2/index.htm
 Some tools:

https://www.squarefree.com/bookmarklets, and in particular the “jsenv”
bookmarklet

Installation instructions:

Connect to https://www.squarefree.com/bookmarklets/webdevel.html

Drag the “jsenv” button from the Web page to your bookmarks bar/folder

Just click on the link within your bookmarks to open the environment

Note: the tool works on the current Web page, so if you want it to run on
another page just close it, open the new page, and then click on the
bookmarklet again.
Università
della
Svizzera
italiana
Facoltà
di scienze della
comunicazione
I
19
Exercises
The following is a regular expression that we created and
tested during the lesson
 Write a regexp which matches (and is able to extract) the
URL and the text connected with an anchor tag
 Example string to parse:
 <a href="http://blablabla">Click here</a>
 RegExp:
 <a href="([^"]+)">([^<]+)</a>
 Wrong RegExp:
 <a href="([^"]+)">(.[^<]+)</a>
(note the dot!)
 The version with the dot matches the example string
correctly, however it also matches empty anchors like:
 <a href="http://blablabla"></a> (add whatever here)</a>