Welcome, Guest User :: Click here to login

Logo 67272

Lab 8: Writing Regular Expressions

Due Date: March 28

Objectives

  • teach students to write regular expressions
  • practice creating classes in Ruby
  • review using irb in development

Teaching Session Links
https://gist.github.com/conhanley/81e59231607bead677af7e203dfc2cbf

README.md
  1. We are going to begin by writing a simple program for Ruby to test regular expressions and then modify it several times. To begin, create a new file called regex_tester.rb using your preferred editor/IDE.

  2. Create another file called test_arrays.rb and add to it the arrays listed below.

    # TEST ARRAYS FOR REGEX TESTER
    # ----------------------------
    %w[http://www.google.com apidock.com www.microsoft.com http://www.heimann-family.org http://www.kli.org http://www.acac.net http://www.cmu.edu http://is.hss.cmu.edu www.amazon.co.uk]
    
    %w[1234567890123456 1234-5678-9012-3456 1234\ 5678\ 9012\ 3456 1234567890 #1234567890123456 1234|5678|9012|3456 12345678901234567]
    
    # INITIAL REGEX PATTERN FOR REGEX TESTER
    # --------------------------------------
    pattern = /^(http:\/\/)?www\.\w+\.(com|edu|org)$/
    
  3. In regex_tester.rb, create a new class called RegexTester. This class should have a constructor (initialize) that takes an argument called pattern and this argument should have a default value of nil. Within this method, add a line of code that will set the instance variable pattern to the local variable 'pattern'. Finally, in case no value of pattern was passed when the object was created, we will generate a setter (and a getter) with the simple line: attr_accessor :pattern.

  4. Verify that your RegexTester class looks like this before going forward:

  class RegexTester
    def initialize(pattern=nil)
      @pattern = pattern unless pattern.nil?
    end

    attr_accessor :pattern
  
  end
  1. We are going to build our getters and setters for statement manually so that (1) you know how it is done, and (2) we can gut them later in this lab. To create the setter method type the following code into the class:
  def statement=(statement)
    @statement = statement
  end

To create the getter method, type the following code:

  def statement
    @statement
  end

By the way, have you been using Git to save all code? If not, I suggest you start immediately and commit your code to git on a regular basis.

  1. Time to test these methods. At the end of the file after the class has ended, add the following code:
  regex = RegexTester.new
  regex.pattern = /^(http:\/\/)?www\.\w+\.(com|edu|org)$/  # from test_arrays.rb
  puts regex.pattern
  regex.statement = "http://www.google.com"
  puts regex.statement

Run the code and see that the output of the getter statements is what we'd expect. The statement output matches what we set it to be. Recall from lab 2 that you can type ruby regex_tester.rb in terminal to run the code.

  1. Next we are going to create a method called test for the RegexTester class which will test the statement to see if it matches the regex pattern. The code is below. Note in this case that we are calling a method yet to be defined – pattern_matches? – and this method takes the argument @statement but that there are no parentheses used. Also note that for mistakes we are using Ruby's standard error output to print the message. Finally, remember that #{} within quotes will evaluate the Ruby code inside and convert to string.
  def test
    if pattern_matches? @statement
      puts "MATCH: #{@statement}"
    else
      STDERR.puts "NO MATCH: #{@statement}"
    end
  end 
  1. Now it's time to add the pattern_matches? method. For this lab, we will make it a private method. To do this, we simply write the keyword private in the line ahead of the method; now this method (and any that follow after the private declaration) will be private. The code for this method is very simple:
  def pattern_matches? statement
    statement.match(@pattern) != nil
  end
  1. Now it is time to test this. Assuming you still have the test code from step 7, just add the lines below and run it. Notice that when you run it the first statement is a match but that the second statement fails and is printed in red (if supported by your OS).
  puts "------"
  regex.test
  regex.statement = "apidock.com"
  regex.test
  1. Now we'd really like to test a batch of statements all at once, so we will have to modify this code a tad. Before jumping into that, let's take a quick break by whipping up a solution in irb first. (This will remind you about how to use irb as well.) Fire up irb on the command line and test with the block:
arr = %w[http://www.google.com apidock.com www.microsoft.com http://www.heimann-family.org http://www.kli.org http://www.acac.net http://www.cmu.edu http://is.hss.cmu.edu www.amazon.co.uk]
pattern = /^(http:\/\/)?www\.\w+\.(com|edu|org)$/
arr.each { |item| puts "MATCH #{item}" if item.match(pattern) != nil }

Notice how it converts %w[] to a regular array (... saw this in an earlier lab). Also notice that regex patterns always start with a / thus pattern = /....

  1. In this case, there should be four matches (google, Microsoft, kli, cmu) from the array of possibilities. This should be the case after we modify our regex_tester class. Show this to the TA and get the check off required.

Stop

Show a TA your code to this point and your irb terminal to verify it is working properly. Make sure the TA initials your sheet.


  1. Continuing on, to convert our previous method we will begin by renaming the getter method for statement to statements and convert the instance variable to @statements. Next, we want to modify the setter method for statement to statements so that it raises a TypeError if anything other than an array is passed to the method and a RuntimeError if the array is empty. As a result of one of these exceptions, the user should get a helpful message explaining the problem, followed by a final insult (same insult used for either failure), followed by program termination. This revised method can be seen below:
  def statements=(arr)
    begin
      raise TypeError unless arr.class == Array
      raise RuntimeError if arr.empty?
      @statements = arr

    rescue RuntimeError
      STDERR.puts "You need to have at least one statement to test against the pattern."
      add_insult
      exit
    rescue TypeError
      STDERR.puts "You must enter an ARRAY of statements to use this regex_tester." 
      add_insult
      exit  
    end
  end
  1. Test the setter method to make sure these exceptions are handled properly. Assuming you did not create an add_insult method in advance, the tests erred. Create the add_insult method to the private section of the class. An example of this method appears below, but feel free to create any appropriate insult (i.e., one without vulgar, sexist or racist language) that you prefer.
  def add_insult
    STDERR.puts "-------------------------------------"
    STDERR.puts "As a coding infidel, you are hereby sentenced to death.  The firing squad will be here shortly to carry out the execution.  Please remain seated until they arrive. Thank you for your cooperation."   
  end
  1. Next we need to change the test method so that it iterates through the array and tests each item. This is not unlike what we just did in irb, except that our test code is multiple lines, so we will have to use do ... end rather than {} for our block. Try to do this on your own, but don't wait too long to ask a TA for help. [There is a lot more to this lab and we don't want you stuck here too long.]

  2. Test the revisions with the first test array provided in Part 1 Step 2. Similar to what we saw in irb, we should get the following results:

  MATCH: http://www.google.com
  MATCH: www.microsoft.com
  MATCH: http://www.kli.org
  MATCH: http://www.cmu.edu
  NO MATCH: apidock.com
  NO MATCH: http://www.heimann-family.org
  NO MATCH: http://www.acac.net
  NO MATCH: http://is.hss.cmu.edu
  NO MATCH: www.amazon.co.uk

Notice that the standard error output always comes after the regular output and that it is displayed in red on most systems. (Don't panic if not in red on your machine.)

  1. Comment out these test lines for now and let's switch gears slightly. We want to build a regex pattern for validating credit cards. To simplify matters for lab purposes, we assume that all valid credit cards must have 16 digits, with optional spaces or dashes breaking up those digits into 4 groups of four. The second test array has six test statements – the first three are valid credit cards and the last four are invalid. We will begin this part of the lab by creating a new set of testing code:
  cc = RegexTester.new
  cc.statements = %w[ .. second test array goes here .. ]

Remember to copy in the second test array from Part 1 Step 2 of this lab. We will build the pattern slowly and in steps listed below.

  1. Allow for just 16 digits by adding
  cc.pattern = /^\d{16}$/

and then test with

  cc.test

Running this code resulted in only the first statement passing and the rest failing.

  1. Now modify the pattern:
  cc.pattern = /^\d{16}$|^(\d{4}-\d{4}-\d{4}-\d{4})$/

Rerun the code and the second statement should now pass. Be sure you understand why this works for proceeding (Ask a TA for help if you are not sure).

  1. We want to shorten this pattern and get rid of some of the repetition. Change the pattern to
  cc.pattern = /^\d{16}$|^((\d{4}-){4})$/

and see what happens. Why does it fail? Be sure to understand why (Ask a TA if needed) before proceeding.

  1. Let's fix that problem by changing the pattern above ever so slightly to
  cc.pattern = /^\d{16}$|^((\d{4}-?){4})$/

What is the difference and why does it matter? Again, be sure to understand why before proceeding.

  1. We want to get the space recognized, so we will modify the pattern again to
  cc.pattern = /^\d{16}$|^(\d{4}[ -]?){4}$/

Confirm that this works and be sure you know why.

  1. Wait! You just had a great epiphany and realize that the 'or' is no longer needed! Why, you could just shorten this to regex to /^(\d{4}[ -]?){4}$/ and it would work. Try this and confirm your insight and be sure you fully understand why it works before getting the TA sign-off on this part.

  2. Now that we can see how our little regex_tester will help us create viable regular expressions, use it to go back to the original pattern and first test array and revise the pattern so that it only allows for proper URLs but also that it allows all the items in the test array (which are in fact valid and working URLs). After you make corrections to the pattern and all statements pass the test. A word to the wise: this is a great little study/practice tool as you prepare for Exam 2.


Stop

Show a TA that you have a working version of RegexTester. Make sure the TA initials your sheet.