Python) Regular Expression

2022. 11. 21. 11:18

Regular Expression is a format language used to represent a set of strings with specific rules. It searches and replaces strings in Programming Language or Text Editor, etc.
To represent certain conditions in the inserted string, the general conditional statement may be somewhat complicated, however, with regular expressions, it is very simple. The code is simple, but it isn't easy to understand unless you are familiar with the expression because it is not readable. Regular expressions in Python are provided by re-module.

To understand how the codes get shorter and simpler, here are codes without using regular expressions.

This code changes the last seven digits to * from the inserted ID numbers.

data = """
        park 800905-1049118
        kim  700905-1059119
       """
result=[]
for line in data.split('\n'):           # line = "park 800905-1049118"
 word_result=[]                         # word = "park"

 for word in line.split(' '):           # word = "800905-1049118"
    if len(word)==14 and word[:6].isdigit() and word[7:].isdigit():
        word = word[:6]+'-'+'*******'   # word = "800905-*******"
    word_result.append(word)            # word_result=["park","800905-*******"]
 result.append(" ".join(word_result))
print('\n'.join(result))

With regular expressions, it gets way simpler.

With sub(a, b), you can easily change a to b.

import re

data = """
        park 800905-1049118
        kim  700905-1059119
       """

# Regular Expression
pat = re.compile('(\d{6})[-](\d{7})')       

print(pat.sub("\g<1>-*******", data))

print(pat.sub("******-\g<2>", data))

Meta characters

Meta character	Description	Example
[]	It represents the set of characters.	"[a-z]"
\	It represents the special sequence.	"\r"
.	It signals that any character is present at some specific place.	"Ja.v."
^	It represents the pattern present at the beginning of the string.	"^Python"
$	It represents the pattern present at the end of the string.	"Meadow"
*	It represents zero or more occurrences of a pattern in the string.	"hello*"
+	It represents one or more occurrences of a pattern in the string.	"hello+"
{}	The specified number of occurrences of a pattern in the string.	"python{2}"
\|	It represents either this or that character is present.	"hello\|world"
()	Capture and group	(agilemeadow)

[]

import re

# [abc] : Match if there are any matching letters.
# RE
p1 = re.compile('[abc]')

print(p1.match('a'))                # match='a'
print(p1.match('before'))           # match='b'
print(p1.match('dude'))             # None

p = re.match('[abc]', 'a')          # match='a'
print(p)

p3 = re.compile('ca*t')

print(p3.match('ct'))           # match='ct'
print(p3.match('cat'))          # match='cat'
print(p3.match('caaat'))        # match='caaat'

Repetition

p4 = re.compile('ca+t')

print(p4.match('ca'))           # None
print(p4.match('cat'))          # match='cat'
print(p4.match('caaat'))        # match='caaat'

{a}

p5 = re.compile('ca{2}t')

print(p5.match('cat'))          # None
print(p5.match('caat'))         # match='caat'

{ a, b }

Match if the letter repeats a - b times.

p6 = re.compile('ca{2,5}t')

print(p6.match('cat'))          # None
print(p6.match('caat'))         # match='caat'
print(p6.match('caaaaat'))      # match='caaaaat'

repetition

p7 = re.compile('ab?c')

print(p7.match('ac'))           # match='ac'
print(p7.match('abc'))          # match='abc'

Example

The results are different depending if you put + or not.

\s is for white space(an empty space), whereas \S is opposite.

import re

m1 = re.match('[0-9]', '1234')
print(m1)                           # match='1'
print(m1.group())                   

m2 = re.match('[0-9]', 'abc')
print(m2)                          

m3 = re.match('[0-9]+', '1234')
print(m3)                           # match='1234'
print(m3.group())                   # 1234

m4 = re.match('[0-9]+', ' 1234')
print(m4)                           # None

m5 = re.match('\s[0-9]+', ' 1234')
print(m5)                           # match=' 1234'
print(m5.group())                   # 1234

m6 = re.search('[0-9]+', ' 1234')
print(m6)                           # match='1234'
print(m6.group())                   # 1234

To search strings

match()


import re

from sympy import primenu

p = re.compile('[a-z]+')

m1 = p.match('python')
print(m1)                       # match='python'
m2 = p.match('Python')
print(m2)                       # None
m3 = p.match('pythoN')
print(m3)                       # match='pytho'
m4 = p.match('pyThon')
print(m4)                       # match='py'
m5 = p.match('3 python')
print(m5)                       # None

search()

print('search()함수')
s1 = p.search('python')
s2 = p.search('Python')
s3 = p.search('pythoN')
s4 = p.search('pyThon')
s5 = p.search('3 python')
print(s1)                   # match='python'
print(s2)                   # match='ython'
print(s3)                   # match='pytho'
print(s4)                   # match='py'
print(s5)                   # match='python'

findall()

result1 = p.findall('life is too short')
print(type(result1))        # 'list'
print(result1)              # ['life', 'is', 'too', 'short']

result2 = p.findall('Life is tOo shorT')
print(result2)              # ['ife', 'is', 't', 'o', 'shor']

finditer()

result3 = p.finditer('life is too short')
print(type(result3))        # 'callable_iterator'
print(result3)              # <callable_iterator object at 0x0000020A8F53DC48>

for r in result3:
    print(r)

result4 = p.finditer('Life is tOo shorT')
for r in result4:
    print(r)

sub()

sub() is to substitute the string a with the string b.

sub(string a, string b)

import re

p = re.compile('blue|white|red')

# substitute from  blue, white, red to  'gold'
print(p.sub('gold', 'blue socks and red shoes'))

print(p.sub('silver', 'blue socks and red shoes', count=1))

Example

To change the last four digits to "####"

import re

s = """
    park 010-9999-9988
    kim 010-9909-7789
    lee 010-8789-7768
"""

pat = re.compile("(\d{3}[-]\d{4})[-]\d{4}")
result = pat.sub("\g<1>-####", s)

print(result)

저작자표시 비영리 변경금지

'Python' 카테고리의 다른 글

Python) Data Analysis - numpy (0)	2022.11.24
Python) Files input and output (0)	2022.11.22
Python) Database connection with MySQL (0)	2022.11.20
Python) Database connection with SQLite (0)	2022.11.19
Python) Class and Method (0)	2022.11.18

agilemeadow Devlog

Python) Regular Expression

'Python' 카테고리의 다른 글

+ Recent posts

티스토리툴바