-
We are going to begin by writing a simple program for Ruby to test regular expressions and then modify it several times. To begin, create a new file called regex_tester.rb
using your preferred editor/IDE.
-
Create another file called test_arrays.rb
and add to it the arrays listed below.
# TEST ARRAYS FOR REGEX TESTER
# ----------------------------
%w[http://www.google.com apidock.com www.microsoft.com http://www.heimann-family.org http://www.kli.org http://www.acac.net http://www.cmu.edu http://is.hss.cmu.edu www.amazon.co.uk]
%w[1234567890123456 1234-5678-9012-3456 1234\ 5678\ 9012\ 3456 1234567890 #1234567890123456 1234|5678|9012|3456 12345678901234567]
# INITIAL REGEX PATTERN FOR REGEX TESTER
# --------------------------------------
pattern = /^(http:\/\/)?www\.\w+\.(com|edu|org)$/
-
In regex_tester.rb
, create a new class called RegexTester. This class should have a constructor (initialize) that takes an argument called pattern
and this argument should have a default value of nil. Within this method, add a line of code that will set the instance variable pattern
to the local variable 'pattern'. Finally, in case no value of pattern
was passed when the object was created, we will generate a setter (and a getter) with the simple line: attr_accessor :pattern
.
-
Verify that your RegexTester class looks like this before going forward:
class RegexTester
def initialize(pattern=nil)
@pattern = pattern unless pattern.nil?
end
attr_accessor :pattern
end
- We are going to build our getters and setters for
statement
manually so that (1) you know how it is done, and (2) we can gut them later in this lab. To create the setter method type the following code into the class:
def statement=(statement)
@statement = statement
end
To create the getter method, type the following code:
def statement
@statement
end
By the way, have you been using Git to save all code? If not, I suggest you start immediately and commit your code to git on a regular basis.
- Time to test these methods. At the end of the file after the class has ended, add the following code:
regex = RegexTester.new
regex.pattern = /^(http:\/\/)?www\.\w+\.(com|edu|org)$/ # from test_arrays.rb
puts regex.pattern
regex.statement = "http://www.google.com"
puts regex.statement
Run the code and see that the output of the getter statements is what we'd expect. The statement output matches what we set it to be. Recall from lab 2 that you can type ruby regex_tester.rb
in terminal to run the code.
- Next we are going to create a method called
test
for the RegexTester class which will test the statement to see if it matches the regex pattern. The code is below. Note in this case that we are calling a method yet to be defined – pattern_matches?
– and this method takes the argument @statement
but that there are no parentheses used. Also note that for mistakes we are using Ruby's standard error output to print the message. Finally, remember that #{}
within quotes will evaluate the Ruby code inside and convert to string.
def test
if pattern_matches? @statement
puts "MATCH: #{@statement}"
else
STDERR.puts "NO MATCH: #{@statement}"
end
end
- Now it's time to add the
pattern_matches?
method. For this lab, we will make it a private method. To do this, we simply write the keyword private
in the line ahead of the method; now this method (and any that follow after the private declaration) will be private. The code for this method is very simple:
def pattern_matches? statement
statement.match(@pattern) != nil
end
- Now it is time to test this. Assuming you still have the test code from step 7, just add the lines below and run it. Notice that when you run it the first statement is a match but that the second statement fails and is printed in red (if supported by your OS).
puts "------"
regex.test
regex.statement = "apidock.com"
regex.test
- Now we'd really like to test a batch of statements all at once, so we will have to modify this code a tad. Before jumping into that, let's take a quick break by whipping up a solution in
irb
first. (This will remind you about how to use irb
as well.) Fire up irb
on the command line and test with the block:
arr = %w[http://www.google.com apidock.com www.microsoft.com http://www.heimann-family.org http://www.kli.org http://www.acac.net http://www.cmu.edu http://is.hss.cmu.edu www.amazon.co.uk]
pattern = /^(http:\/\/)?www\.\w+\.(com|edu|org)$/
arr.each { |item| puts "MATCH #{item}" if item.match(pattern) != nil }
Notice how it converts %w[]
to a regular array (... saw this in an earlier lab). Also notice that regex patterns always start with a /
thus pattern = /...
.
- In this case, there should be four matches (google, Microsoft, kli, cmu) from the array of possibilities. This should be the case after we modify our regex_tester class. Show this to the TA and get the check off required.
Stop
Show a TA your code to this point and your irb terminal to verify it is working properly. Make sure the TA initials your sheet.
- Continuing on, to convert our previous method we will begin by renaming the getter method for
statement
to statements
and convert the instance variable to @statements
. Next, we want to modify the setter method for statement
to statements
so that it raises a TypeError if anything other than an array is passed to the method and a RuntimeError if the array is empty. As a result of one of these exceptions, the user should get a helpful message explaining the problem, followed by a final insult (same insult used for either failure), followed by program termination. This revised method can be seen below:
def statements=(arr)
begin
raise TypeError unless arr.class == Array
raise RuntimeError if arr.empty?
@statements = arr
rescue RuntimeError
STDERR.puts "You need to have at least one statement to test against the pattern."
add_insult
exit
rescue TypeError
STDERR.puts "You must enter an ARRAY of statements to use this regex_tester."
add_insult
exit
end
end
- Test the setter method to make sure these exceptions are handled properly. Assuming you did not create an
add_insult
method in advance, the tests erred. Create the add_insult
method to the private
section of the class. An example of this method appears below, but feel free to create any appropriate insult (i.e., one without vulgar, sexist or racist language) that you prefer.
def add_insult
STDERR.puts "-------------------------------------"
STDERR.puts "As a coding infidel, you are hereby sentenced to death. The firing squad will be here shortly to carry out the execution. Please remain seated until they arrive. Thank you for your cooperation."
end
-
Next we need to change the test
method so that it iterates through the array and tests each item. This is not unlike what we just did in irb
, except that our test code is multiple lines, so we will have to use do ... end
rather than {} for our block. Try to do this on your own, but don't wait too long to ask a TA for help. [There is a lot more to this lab and we don't want you stuck here too long.]
-
Test the revisions with the first test array provided in Part 1 Step 2. Similar to what we saw in irb, we should get the following results:
MATCH: http://www.google.com
MATCH: www.microsoft.com
MATCH: http://www.kli.org
MATCH: http://www.cmu.edu
NO MATCH: apidock.com
NO MATCH: http://www.heimann-family.org
NO MATCH: http://www.acac.net
NO MATCH: http://is.hss.cmu.edu
NO MATCH: www.amazon.co.uk
Notice that the standard error output always comes after the regular output and that it is displayed in red on most systems. (Don't panic if not in red on your machine.)
- Comment out these test lines for now and let's switch gears slightly. We want to build a regex pattern for validating credit cards. To simplify matters for lab purposes, we assume that all valid credit cards must have 16 digits, with optional spaces or dashes breaking up those digits into 4 groups of four. The second test array has six test statements – the first three are valid credit cards and the last four are invalid. We will begin this part of the lab by creating a new set of testing code:
cc = RegexTester.new
cc.statements = %w[ .. second test array goes here .. ]
Remember to copy in the second test array from Part 1 Step 2 of this lab. We will build the pattern slowly and in steps listed below.
- Allow for just 16 digits by adding
cc.pattern = /^\d{16}$/
and then test with
cc.test
Running this code resulted in only the first statement passing and the rest failing.
- Now modify the pattern:
cc.pattern = /^\d{16}$|^(\d{4}-\d{4}-\d{4}-\d{4})$/
Rerun the code and the second statement should now pass. Be sure you understand why this works for proceeding (Ask a TA for help if you are not sure).
- We want to shorten this pattern and get rid of some of the repetition. Change the pattern to
cc.pattern = /^\d{16}$|^((\d{4}-){4})$/
and see what happens. Why does it fail? Be sure to understand why (Ask a TA if needed) before proceeding.
- Let's fix that problem by changing the pattern above ever so slightly to
cc.pattern = /^\d{16}$|^((\d{4}-?){4})$/
What is the difference and why does it matter? Again, be sure to understand why before proceeding.
- We want to get the space recognized, so we will modify the pattern again to
cc.pattern = /^\d{16}$|^(\d{4}[ -]?){4}$/
Confirm that this works and be sure you know why.
-
Wait! You just had a great epiphany and realize that the 'or' is no longer needed! Why, you could just shorten this to regex to /^(\d{4}[ -]?){4}$/
and it would work. Try this and confirm your insight and be sure you fully understand why it works before getting the TA sign-off on this part.
-
Now that we can see how our little regex_tester will help us create viable regular expressions, use it to go back to the original pattern and first test array and revise the pattern so that it only allows for proper URLs but also that it allows all the items in the test array (which are in fact valid and working URLs). After you make corrections to the pattern and all statements pass the test. A word to the wise: this is a great little study/practice tool as you prepare for Exam 2.
Stop
Show a TA that you have a working version of RegexTester. Make sure the TA initials your sheet.