Somewhat 'Smarter' Search and Replacelet's say you have some old HTML 3.2 documents that you want to bring into the 21st century by updating to HTML 4.01 or XHTML. So you want to remove all those nasty font tags and replace them with styles. HTML-Tidy is a free HTML validator (one of the ones recommended by w3.org) which has this capability built in. With a click of the button all font tags can be removed and replaced with span tags with an associated class defined in a style. Some other search/replace examples: [Digithead warning: Unless you carry your writing utensils in a pocket protector, you may not be interested in the following.] | ||
HTML-Kit is a free, full featured HTML and script editor (text editor! not WYSIWYG) developed by Chami.com. Among its many features is it allows you to build 'smart' search and replaces since your search phrase can be a regular expression. (Note: HTML Tidy, mentioned above, is also built in!) I recently had an occassion to use this to remove all the width="xx%" attributes in a very large HTML table that was generated in Excel '95 with Internet Assistant. This utility puts a width= attribute in every cell! This table had many columns, each with their own unique width. I could have issued several searches/replaces (one for each column width) as follows:
or I could issue one search and replace as a regular expression to get them all:
What this says is search for width=" followed by a string of digits of any length which is then followed by another double quote. Then remove that string (i.e. replace with nothing) Another slightly more complicated example: Say (for whatever reason) you had a document containing a series of related terms like p01, p0201, p031156,... and (for some other reason) you needed them all changed to q01, q0201, q031156, ... (is this minding your p's and q's?). You can not simply do a global replace for all p's to q's since that would make changes you did not intend. Here again, a regular expression replace can make your work easy:
The above says to search for a p followed by a series of digits of any length and replace it with a q followed by the same digits. The %3 is called a back-reference. In this case it refers back to the 3rd thing that is matched in the search p(\d+). The p is the first match. One would think the string of digits would be the second match but (for some reason, I don't know why) there is an extra NULL match thrown in for parenthesized things so it is referred to as the 3rd match in HTML-Kit. This is where HTML-Kit's regular expressions are different from PERL's, for example. In all cases, things enclosed in parentheses are grouped as a matching entity, but in PERL (for example) you would refer back to the series of digits above as $1 instead of %3 (i.e. the first back-reference, only things in parentheses are available as back-references). Also note the dollar sign instead of the percent sign. It is a bit confusing, but does seem to be consistent once you know this. Just test your search/replace command on a single replace to make sure it works as expected before you hit 'Replace All'. |
What the !^.*$! is a "regular expression"? |