Regular Expression is a format language used to represent a set of strings with specific rules. It searches and replaces strings in Programming Language or Text Editor, etc.
To represent certain conditions in the inserted string, the general conditional statement may be somewhat complicated, however, with regular expressions, it is very simple. The code is simple, but it isn't easy to understand unless you are familiar with the expression because it is not readable. Regular expressions in Python are provided by re-module.
To understand how the codes get shorter and simpler, here are codes without using regular expressions.
This code changes the last seven digits to * from the inserted ID numbers.
data = """
park 800905-1049118
kim 700905-1059119
"""
result=[]
for line in data.split('\n'): # line = "park 800905-1049118"
word_result=[] # word = "park"
for word in line.split(' '): # word = "800905-1049118"
if len(word)==14 and word[:6].isdigit() and word[7:].isdigit():
word = word[:6]+'-'+'*******' # word = "800905-*******"
word_result.append(word) # word_result=["park","800905-*******"]
result.append(" ".join(word_result))
print('\n'.join(result))
With regular expressions, it gets way simpler.
With sub(a, b), you can easily change a to b.
import re
data = """
park 800905-1049118
kim 700905-1059119
"""
# Regular Expression
pat = re.compile('(\d{6})[-](\d{7})')
print(pat.sub("\g<1>-*******", data))
print(pat.sub("******-\g<2>", data))
Meta characters
Meta character | Description | Example |
[] | It represents the set of characters. | "[a-z]" |
\ | It represents the special sequence. | "\r" |
. | It signals that any character is present at some specific place. | "Ja.v." |
^ | It represents the pattern present at the beginning of the string. | "^Python" |
$ | It represents the pattern present at the end of the string. | "Meadow" |
* | It represents zero or more occurrences of a pattern in the string. | "hello*" |
+ | It represents one or more occurrences of a pattern in the string. | "hello+" |
{} | The specified number of occurrences of a pattern in the string. | "python{2}" |
| | It represents either this or that character is present. | "hello|world" |
() | Capture and group | (agilemeadow) |
[]
import re
# [abc] : Match if there are any matching letters.
# RE
p1 = re.compile('[abc]')
print(p1.match('a')) # match='a'
print(p1.match('before')) # match='b'
print(p1.match('dude')) # None
p = re.match('[abc]', 'a') # match='a'
print(p)
*
p3 = re.compile('ca*t')
print(p3.match('ct')) # match='ct'
print(p3.match('cat')) # match='cat'
print(p3.match('caaat')) # match='caaat'
+
Repetition
p4 = re.compile('ca+t')
print(p4.match('ca')) # None
print(p4.match('cat')) # match='cat'
print(p4.match('caaat')) # match='caaat'
{a}
p5 = re.compile('ca{2}t')
print(p5.match('cat')) # None
print(p5.match('caat')) # match='caat'
{ a, b }
Match if the letter repeats a - b times.
p6 = re.compile('ca{2,5}t')
print(p6.match('cat')) # None
print(p6.match('caat')) # match='caat'
print(p6.match('caaaaat')) # match='caaaaat'
?
repetition
p7 = re.compile('ab?c')
print(p7.match('ac')) # match='ac'
print(p7.match('abc')) # match='abc'
Example
The results are different depending if you put + or not.
\s is for white space(an empty space), whereas \S is opposite.
import re
m1 = re.match('[0-9]', '1234')
print(m1) # match='1'
print(m1.group())
m2 = re.match('[0-9]', 'abc')
print(m2)
m3 = re.match('[0-9]+', '1234')
print(m3) # match='1234'
print(m3.group()) # 1234
m4 = re.match('[0-9]+', ' 1234')
print(m4) # None
m5 = re.match('\s[0-9]+', ' 1234')
print(m5) # match=' 1234'
print(m5.group()) # 1234
m6 = re.search('[0-9]+', ' 1234')
print(m6) # match='1234'
print(m6.group()) # 1234
To search strings
match()
import re
from sympy import primenu
p = re.compile('[a-z]+')
m1 = p.match('python')
print(m1) # match='python'
m2 = p.match('Python')
print(m2) # None
m3 = p.match('pythoN')
print(m3) # match='pytho'
m4 = p.match('pyThon')
print(m4) # match='py'
m5 = p.match('3 python')
print(m5) # None
search()
print('search()함수')
s1 = p.search('python')
s2 = p.search('Python')
s3 = p.search('pythoN')
s4 = p.search('pyThon')
s5 = p.search('3 python')
print(s1) # match='python'
print(s2) # match='ython'
print(s3) # match='pytho'
print(s4) # match='py'
print(s5) # match='python'
findall()
result1 = p.findall('life is too short')
print(type(result1)) # 'list'
print(result1) # ['life', 'is', 'too', 'short']
result2 = p.findall('Life is tOo shorT')
print(result2) # ['ife', 'is', 't', 'o', 'shor']
finditer()
result3 = p.finditer('life is too short')
print(type(result3)) # 'callable_iterator'
print(result3) # <callable_iterator object at 0x0000020A8F53DC48>
for r in result3:
print(r)
result4 = p.finditer('Life is tOo shorT')
for r in result4:
print(r)
sub()
sub() is to substitute the string a with the string b.
sub(string a, string b)
import re
p = re.compile('blue|white|red')
# substitute from blue, white, red to 'gold'
print(p.sub('gold', 'blue socks and red shoes'))
print(p.sub('silver', 'blue socks and red shoes', count=1))

Example
To change the last four digits to "####"
import re
s = """
park 010-9999-9988
kim 010-9909-7789
lee 010-8789-7768
"""
pat = re.compile("(\d{3}[-]\d{4})[-]\d{4}")
result = pat.sub("\g<1>-####", s)
print(result)
'Python' 카테고리의 다른 글
Python) Data Analysis - numpy (0) | 2022.11.24 |
---|---|
Python) Files input and output (0) | 2022.11.22 |
Python) Database connection with MySQL (0) | 2022.11.20 |
Python) Database connection with SQLite (0) | 2022.11.19 |
Python) Class and Method (0) | 2022.11.18 |