Thursday, March 5, 2020

Regular Expressions in Python

Regular Expressions in Python


Regular expression


Regular expressions also known as regex are the set of characters that can be used to specify the rules for matching substrings in any given character sequence. Python provides a built-in package "re", to work with Regular Expressions. We can test if a given string contains a specified pattern or not.


import re
p='python'
str='it is python programming'
match=re.search(p,str)
print(match.span()) #print the match span

#Output:
(6, 12)

The methods defined by "re" package


The "re" package provides the following methods.

Function Purpose
findall() This method returns a list containing all matches found in the string
search() This method returns a Match object if there is a match found in the string
split() This method returns a list of substrings of the string, that is split by the input pattern
sub() This method replaces one or many matches with the input string


Example

import re
str='it is python programming with python'
m=re.search('python',str)  #search method
print(m.start())

f=re.findall('python',str) #findall method
print(f)

l=re.split(' ',str)     #split method
print(l)

s=re.sub('\s','_',str)   #sub method
print(s)


#Output:

6
['python', 'python']
['it', 'is', 'python', 'programming', 'with', 'python']
it_is_python_programming_with_python


The metacharacters


The metacharacters are special characters having some special meaning,

Symbol Purpose
[] A set of characters
\ Signals a special sequence (can also be used to escape special characters)
. Any character (except newline character)
^ Starts with
$ Ends with
* Zero or more occurrences
+ One or more occurrences
{} Exactly the specified number of occurrences
| Either or
() Capture and group

Example

import re
str1='it is python programming with python geeks'
str2='aksnaks5nsk3nkm5'
m=re.search('[1,3,5]',str2)  #search method
print(m)

m=re.search('g..ks',str1); #..
print(m)

m=re.search('^it',str1);  #^
print(m)

m=re.search('geeks$',str1);   #$
print(m)

m=re.search('py*',str1);    #*
print(m)

m=re.search('python|java',str1);   # |
print(m)

#Output:
<_sre.SRE_Match object; span=(7, 8), match='5'>
<_sre.SRE_Match object; span=(37, 42), match='geeks'>
<_sre.SRE_Match object; span=(0, 2), match='it'>
<_sre.SRE_Match object; span=(37, 42), match='geeks'>
<_sre.SRE_Match object; span=(6, 8), match='py'>
<_sre.SRE_Match object; span=(6, 12), match='python'>

Special sequences


There are some special sequences having some special meaning. These sequences start with '\' symbol.


Symbol
Purpose
\A Finds a match if the given characters are at the beginning of the string
\b Finds a match where the given characters are at the beginning or at the end of a word
\B Finds a match where the given characters are present, but not at the beginning (or at the end) of the given sequence
\d Finds a match where the string carries digits (digits between 0-9)
\D Finds a match where the character sequence does not contain digits
\s Finds a match if the string carries a white space
\S Finds a match if the character does not contain any white space
\w Finds a match if the string carries any alphabet, digit or underscore
\W Finds a match if the input string does not carry any alphabet, digit or underscore
\Z Finds a match if the given characters are at the end of the string

Example

import re
str1='it is python programming with python geeks'
m=re.search('\Ait',str1)  #search method
print(m)

m=re.findall('\s',str1)  #findall
print(m)

m=re.sub('\sp',' P',str1)  #sub
print(m)

#Output:
<_sre.SRE_Match object; span=(0, 2), match='it'>
[' ', ' ', ' ', ' ', ' ', ' ']
it is Python Programming with Python geeks


The match object


The match object contains the information about the match results.
 If nothing is matched None is returned.
 Match object has three components.

span: a tuple composed of starting and ending positions of match
string: the input string
group: the part of the string where the match is found.

Example

import re
str='it is python programming with python geeks'

m=re.search('python',str)  #search method

print(m)
print('Span->',m.span())
print('Group->'+m.group())
print('String->'+m.string)

#Output:

<_sre.SRE_Match object; span=(6, 12), match='python'>
Span-> (6, 12)
Group->python
String->it is python programming with python geeks