Regular Expressions in Python
Regular expression
Regular expressions also known as regex are the set of characters that can be used to specify the rules for matching substrings in any given character sequence. Python provides a built-in package "re", to work with Regular Expressions. We can test if a given string contains a specified pattern or not.
import re p='python' str='it is python programming' match=re.search(p,str) print(match.span()) #print the match span #Output: (6, 12)
The methods defined by "re" package
The "re" package provides the following methods.
Function | Purpose |
---|---|
findall() | This method returns a list containing all matches found in the string |
search() | This method returns a Match object if there is a match found in the string |
split() | This method returns a list of substrings of the string, that is split by the input pattern |
sub() | This method replaces one or many matches with the input string |
Example
import re str='it is python programming with python' m=re.search('python',str) #search method print(m.start()) f=re.findall('python',str) #findall method print(f) l=re.split(' ',str) #split method print(l) s=re.sub('\s','_',str) #sub method print(s) #Output: 6 ['python', 'python'] ['it', 'is', 'python', 'programming', 'with', 'python'] it_is_python_programming_with_python
The metacharacters
The metacharacters are special characters having some special meaning,
Symbol | Purpose |
---|---|
[] | A set of characters |
\ | Signals a special sequence (can also be used to escape special characters) |
. | Any character (except newline character) |
^ | Starts with |
$ | Ends with |
* | Zero or more occurrences |
+ | One or more occurrences |
{} | Exactly the specified number of occurrences |
| | Either or |
() | Capture and group |
Example
import re str1='it is python programming with python geeks' str2='aksnaks5nsk3nkm5' m=re.search('[1,3,5]',str2) #search method print(m) m=re.search('g..ks',str1); #.. print(m) m=re.search('^it',str1); #^ print(m) m=re.search('geeks$',str1); #$ print(m) m=re.search('py*',str1); #* print(m) m=re.search('python|java',str1); # | print(m) #Output: <_sre.SRE_Match object; span=(7, 8), match='5'> <_sre.SRE_Match object; span=(37, 42), match='geeks'> <_sre.SRE_Match object; span=(0, 2), match='it'> <_sre.SRE_Match object; span=(37, 42), match='geeks'> <_sre.SRE_Match object; span=(6, 8), match='py'> <_sre.SRE_Match object; span=(6, 12), match='python'>
Special sequences
Symbol |
Purpose
|
\A | Finds a match if the given characters are at the beginning of the string |
\b | Finds a match where the given characters are at the beginning or at the end of a word |
\B | Finds a match where the given characters are present, but not at the beginning (or at the end) of the given sequence |
\d | Finds a match where the string carries digits (digits between 0-9) |
\D | Finds a match where the character sequence does not contain digits |
\s | Finds a match if the string carries a white space |
\S | Finds a match if the character does not contain any white space |
\w | Finds a match if the string carries any alphabet, digit or underscore |
\W | Finds a match if the input string does not carry any alphabet, digit or underscore |
\Z | Finds a match if the given characters are at the end of the string |
Example
import re str1='it is python programming with python geeks' m=re.search('\Ait',str1) #search method print(m) m=re.findall('\s',str1) #findall print(m) m=re.sub('\sp',' P',str1) #sub print(m) #Output: <_sre.SRE_Match object; span=(0, 2), match='it'> [' ', ' ', ' ', ' ', ' ', ' '] it is Python Programming with Python geeks
The match object
The match object contains the information about the match results.
If nothing is matched None is returned.
Match object has three components.
span: a tuple composed of starting and ending positions of match
string: the input string
group: the part of the string where the match is found.
Example
import re str='it is python programming with python geeks' m=re.search('python',str) #search method print(m) print('Span->',m.span()) print('Group->'+m.group()) print('String->'+m.string) #Output: <_sre.SRE_Match object; span=(6, 12), match='python'> Span-> (6, 12) Group->python String->it is python programming with python geeks