Python regex: re.match(), re.search(), re.findall() with example
Содержание:
- Special Sequences
- Real-valued distributions¶
- UserList objects¶
- Major new features of the 3.8 series, compared to 3.7
- re.split()
- [Коллекция] Каковы различные квантификаторы Python Re?
- deque objects¶
- This is the stable release of Python 3.10.0
- Major new features of the 3.10 series, compared to 3.9
- More resources
- And now for something completely different
- Relationship to other Python modules¶
- Functions for sequences¶
- Search and Replace
Special Sequences
A special sequence is a followed by one of the characters in the list below, and has a special meaning:
| Character | Description | Example | Try it | 
|---|---|---|---|
| \A | Returns a match if the specified characters are at the beginning of the string | «\AThe» | Try it » | 
| \b | Returns a match where the specified characters are at the beginning or at the end of a word(the «r» in the beginning is making sure that the string is being treated as a «raw string») | r»\bain»r»ain\b» | Try it »Try it » | 
| \B | Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word(the «r» in the beginning is making sure that the string is being treated as a «raw string») | r»\Bain»r»ain\B» | Try it »Try it » | 
| \d | Returns a match where the string contains digits (numbers from 0-9) | «\d» | Try it » | 
| \D | Returns a match where the string DOES NOT contain digits | «\D» | Try it » | 
| \s | Returns a match where the string contains a white space character | «\s» | Try it » | 
| \S | Returns a match where the string DOES NOT contain a white space character | «\S» | Try it » | 
| \w | Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character) | «\w» | Try it » | 
| \W | Returns a match where the string DOES NOT contain any word characters | «\W» | Try it » | 
| \Z | Returns a match if the specified characters are at the end of the string | «Spain\Z» | Try it » | 
Real-valued distributions¶
The following functions generate specific real-valued distributions. Function
parameters are named after the corresponding variables in the distribution’s
equation, as used in common mathematical practice; most of these equations can
be found in any statistics text.
- ()
- 
Return the next random floating point number in the range [0.0, 1.0). 
- (a, b)
- 
Return a random floating point number N such that for 
 and for .The end-point value may or may not be included in the range 
 depending on floating-point rounding in the equation .
- (low, high, mode)
- 
Return a random floating point number N such that and 
 with the specified mode between those bounds. The low and high bounds
 default to zero and one. The mode argument defaults to the midpoint
 between the bounds, giving a symmetric distribution.
- (alpha, beta)
- 
Beta distribution. Conditions on the parameters are and 
 . Returned values range between 0 and 1.
- (lambd)
- 
Exponential distribution. lambd is 1.0 divided by the desired 
 mean. It should be nonzero. (The parameter would be called
 “lambda”, but that is a reserved word in Python.) Returned values
 range from 0 to positive infinity if lambd is positive, and from
 negative infinity to 0 if lambd is negative.
- (alpha, beta)
- 
Gamma distribution. (Not the gamma function!) Conditions on the 
 parameters are and .The probability distribution function is: x ** (alpha - 1) * math.exp(-x beta) pdf(x) = -------------------------------------- math.gamma(alpha) * beta ** alpha
- (mu, sigma)
- 
Normal distribution, also called the Gaussian distribution. mu is the mean, 
 and sigma is the standard deviation. This is slightly faster than
 the function defined below.Multithreading note: When two threads call this function 
 simultaneously, it is possible that they will receive the
 same return value. This can be avoided in three ways.
 1) Have each thread use a different instance of the random
 number generator. 2) Put locks around all calls. 3) Use the
 slower, but thread-safe function instead.
- (mu, sigma)
- 
Log normal distribution. If you take the natural logarithm of this 
 distribution, you’ll get a normal distribution with mean mu and standard
 deviation sigma. mu can have any value, and sigma must be greater than
 zero.
- (mu, sigma)
- 
Normal distribution. mu is the mean, and sigma is the standard deviation. 
- (mu, kappa)
- 
mu is the mean angle, expressed in radians between 0 and 2*pi, and kappa 
 is the concentration parameter, which must be greater than or equal to zero. If
 kappa is equal to zero, this distribution reduces to a uniform random angle
 over the range 0 to 2*pi.
- (alpha)
- 
Pareto distribution. alpha is the shape parameter. 
UserList objects¶
This class acts as a wrapper around list objects.  It is a useful base class
for your own list-like classes which can inherit from them and override
existing methods or add new ones.  In this way, one can add new behaviors to
lists.
The need for this class has been partially supplanted by the ability to
subclass directly from ; however, this class can be easier
to work with because the underlying list is accessible as an attribute.
- class (list)
- 
Class that simulates a list. The instance’s contents are kept in a regular 
 list, which is accessible via the attribute of
 instances. The instance’s contents are initially set to a copy of list,
 defaulting to the empty list . list can be any iterable, for
 example a real Python list or a object.In addition to supporting the methods and operations of mutable sequences, 
 instances provide the following attribute:- 
A real object used to store the contents of the 
 class.
 
- 
Subclassing requirements: Subclasses of  are expected to
offer a constructor which can be called with either no arguments or one
argument.  List operations which return a new sequence attempt to create an
instance of the actual implementation class.  To do so, it assumes that the
constructor can be called with a single parameter, which is a sequence object
used as a data source.
Major new features of the 3.8 series, compared to 3.7
- PEP 572, Assignment expressions
- PEP 570, Positional-only arguments
- PEP 587, Python Initialization Configuration (improved embedding)
- PEP 590, Vectorcall: a fast calling protocol for CPython
- PEP 578, Runtime audit hooks
- PEP 574, Pickle protocol 5 with out-of-band data
- Typing-related: PEP 591 (Final qualifier), PEP 586 (Literal types), and PEP 589 (TypedDict)
- Parallel filesystem cache for compiled bytecode
- Debug builds share ABI as release builds
- f-strings support a handy specifier for debugging
- is now legal in blocks
- on Windows, the default event loop is now
- on macOS, the spawn start method is now used by default in
- can now use shared memory segments to avoid pickling costs between processes
- is merged back to CPython
- is now 40% faster
- now uses Protocol 4 by default, improving performance
There are many other interesting changes, please consult the «What’s New» page in the documentation for a full list.
re.split()
Данный метод разделяет строку по заданному шаблону. Если шаблон найден, оставшиеся символы из строки возвращаются в виде результирующего списка. Более того, мы можем указать максимальное количество разделений для нашей строки.
Синтаксис:
Возвращаемое значение может быть либо списком строк, на которые была разделена исходная строка, либо пустым списком, если совпадений с шаблоном не нашлось.
Рассмотрим, как работает данный метод, на примере.
import re
# '\W+' совпадает с символами или группой символов, не являющихся буквами или цифрами
# разделение по запятой ',' или пробелу ' '
print(re.split('\W+', 'Good, better , Best'))
print(re.split('\W+', "Book's books Books"))
# Здесь ':', ' ' ,',' - не буквенно-цифровые символы, по которым происходит разделение
print(re.split('\W+', 'Born On 20th July 1989, at 11:00 AM'))
# '\d+' означает цифры или группы цифр
# Разделение происходит по '20', '1989', '11', '00'
print(re.split('\d+', 'Born On 20th July 1989, at 11:00 AM'))
# Указано максимальное количество разделений - 1
print(re.split('\d+', 'Born On 20th July 1989, at 11:00 AM', maxsplit=1))
# Результат:
# 
# 
# 
# 
# 
[Коллекция] Каковы различные квантификаторы Python Re?
Если вы хотите использовать (и понимать) регулярные выражения на практике, вам нужно знать самые важные квантования, которые могут быть применены к любому Regeex (включая Regex dotex)!
Так что давайте погрузимся в другие регеисы:
Квантификатор
Описание
Пример
.
Wild-Card («DOT») соответствует любому символу в строке, кроме нового символа «\ N».
Regex ‘…’ соответствует всем словам с тремя символами, такими как «abc», «Cat» и «собака».
*
Звездочка нулевой или больше соответствует произвольному количеству вхождений (включая нулевые вхождения) непосредственно предшествующего Regex.
Regex ‘Cat *’ соответствует строкам «CA», «CAT», «CATT», «CATTT» и «CATTTTTTT». —
?
Матчи ноль или один (как следует из названия) либо ноль, либо в одних случаях непосредственно предшествующего Regex.
Regex ‘Cat?’ Соответствует обеим струнам «Ca» и «CAT» – но не «CATT», «CATTT» и «CATTTTTTT».
+
По меньшей мере, один соответствует одному или нескольким вхождению непосредственно предшествующего регеек.
Regex ‘Cat +’ не соответствует строке «CA», а соответствует всем строкам, по меньшей мере, одним задним характером «T», такими как «кошка», «CATT» и «CATTT».
^
Начальная строка соответствует началу строки.
Regex ‘^ p’ соответствует строкам «Python» и «программирование», но не «Lisp» и «шпионить», где символ «p» не происходит в начале строки.
$
Конец строки соответствует концу строки.
Regex ‘Py $’ будет соответствовать строкам «Main.py» и «Pypy», но не строки «Python» и «pypi».
A | B.
Или соответствует либо регезе A или REGEX B
Обратите внимание, что интуиция сильно отличается от стандартной интерпретации или оператора, который также может удовлетворить оба условия.
Regex ‘(Hello) | (Привет) «Соответствует строки« Hello World »и« Привет Python ». Было бы не иметь смысла попытаться сопоставить их обоих одновременно.
Аб
И совпадает с первым регелем А и второе регулярное выражение в этой последовательности.
Мы уже видели его тривиально в Regex ‘Ca’, которое соответствует первым Regex ‘C’ и Second Regex ‘A’.
Обратите внимание, что я дал вышеупомянутые операторы некоторых более значимых имен (жирным шрифтом), чтобы вы могли немедленно понять цель каждого Regex. Например, Оператор обычно обозначается как оператор «Caret»
Эти имена не описаны Поэтому я придумал более детские сады, такие как оператор «Пусковая строка».
Мы уже видели много примеров, но давайте погрузимся еще больше!
import re
text = '''
    Ha! let me see her: out, alas! he's cold:
    Her blood is settled, and her joints are stiff;
    Life and these lips have long been separated:
    Death lies on her like an untimely frost
    Upon the sweetest flower of all the field.
'''
print(re.findall('.a!', text))
'''
Finds all occurrences of an arbitrary character that is
followed by the character sequence 'a!'.
'''
print(re.findall('is.*and', text))
'''
Finds all occurrences of the word 'is',
followed by an arbitrary number of characters
and the word 'and'.
'''
print(re.findall('her:?', text))
'''
Finds all occurrences of the word 'her',
followed by zero or one occurrences of the colon ':'.
'''
print(re.findall('her:+', text))
'''
Finds all occurrences of the word 'her',
followed by one or more occurrences of the colon ':'.
'''
print(re.findall('^Ha.*', text))
'''
Finds all occurrences where the string starts with
the character sequence 'Ha', followed by an arbitrary
number of characters except for the new-line character. 
Can you figure out why Python doesn't find any?
[]
'''
print(re.findall('n$', text))
'''
Finds all occurrences where the new-line character 'n'
occurs at the end of the string.
'''
print(re.findall('(Life|Death)', text))
'''
Finds all occurrences of either the word 'Life' or the
word 'Death'.
'''
В этих примерах вы уже видели специальный символ который обозначает нового стилевого символа в Python (и большинство других языках). Есть много специальных символов, специально предназначенных для регулярных выражений.
deque objects¶
- class (iterable, maxlen)
- 
Returns a new deque object initialized left-to-right (using ) with 
 data from iterable. If iterable is not specified, the new deque is empty.Deques are a generalization of stacks and queues (the name is pronounced “deck” 
 and is short for “double-ended queue”). Deques support thread-safe, memory
 efficient appends and pops from either side of the deque with approximately the
 same O(1) performance in either direction.Though objects support similar operations, they are optimized for 
 fast fixed-length operations and incur O(n) memory movement costs for
 and operations which change both the size and
 position of the underlying data representation.If maxlen is not specified or is , deques may grow to an 
 arbitrary length. Otherwise, the deque is bounded to the specified maximum
 length. Once a bounded length deque is full, when new items are added, a
 corresponding number of items are discarded from the opposite end. Bounded
 length deques provide functionality similar to the filter in
 Unix. They are also useful for tracking transactions and other pools of data
 where only the most recent activity is of interest.Deque objects support the following methods: - (x)
- 
Add x to the right side of the deque. 
 - (x)
- 
Add x to the left side of the deque. 
 - ()
- 
Remove all elements from the deque leaving it with length 0. 
 - ()
- 
Create a shallow copy of the deque. New in version 3.5. 
 - (x)
- 
Count the number of deque elements equal to x. New in version 3.2. 
 - (iterable)
- 
Extend the right side of the deque by appending elements from the iterable 
 argument.
 - (iterable)
- 
Extend the left side of the deque by appending elements from iterable. 
 Note, the series of left appends results in reversing the order of
 elements in the iterable argument.
 - (x, start, stop)
- 
Return the position of x in the deque (at or after index start 
 and before index stop). Returns the first match or raises
 if not found.New in version 3.5. 
 - (i, x)
- 
Insert x into the deque at position i. If the insertion would cause a bounded deque to grow beyond maxlen, 
 an is raised.New in version 3.5. 
 - ()
- 
Remove and return an element from the right side of the deque. If no 
 elements are present, raises an .
 - ()
- 
Remove and return an element from the left side of the deque. If no 
 elements are present, raises an .
 - (value)
- 
Remove the first occurrence of value. If not found, raises a 
 .
 - ()
- 
Reverse the elements of the deque in-place and then return . New in version 3.2. 
 - (n=1)
- 
Rotate the deque n steps to the right. If n is negative, rotate 
 to the left.When the deque is not empty, rotating one step to the right is equivalent 
 to , and rotating one step to the left is
 equivalent to .
 Deque objects also provide one read-only attribute: - 
Maximum size of a deque or if unbounded. New in version 3.1. 
 
In addition to the above, deques support iteration, pickling, ,
, , , membership testing with
the  operator, and subscript references such as  to access
the first element.  Indexed access is O(1) at both ends but slows to O(n) in
the middle.  For fast random access, use lists instead.
Starting in version 3.5, deques support , ,
and .
Example:
>>> from collections import deque
>>> d = deque('ghi')                 # make a new deque with three items
>>> for elem in d                   # iterate over the deque's elements
...     print(elem.upper())
G
H
I
>>> d.append('j')                    # add a new entry to the right side
>>> d.appendleft('f')                # add a new entry to the left side
>>> d                                # show the representation of the deque
deque()
>>> d.pop()                          # return and remove the rightmost item
'j'
>>> d.popleft()                      # return and remove the leftmost item
'f'
>>> list(d)                          # list the contents of the deque
>>> d                             # peek at leftmost item
'g'
>>> d-1                            # peek at rightmost item
'i'
>>> list(reversed(d))                # list the contents of a deque in reverse
>>> 'h' in d                         # search the deque
True
>>> d.extend('jkl')                  # add multiple elements at once
>>> d
deque()
>>> d.rotate(1)                      # right rotation
>>> d
deque()
>>> d.rotate(-1)                     # left rotation
>>> d
deque()
>>> deque(reversed(d))               # make a new deque in reverse order
deque()
>>> d.clear()                        # empty the deque
>>> d.pop()                          # cannot pop from an empty deque
Traceback (most recent call last):
    File "<pyshell#6>", line 1, in -toplevel-
        d.pop()
IndexError: pop from an empty deque
>>> d.extendleft('abc')              # extendleft() reverses the input order
>>> d
deque()
This is the stable release of Python 3.10.0
Python 3.10.0 is the newest major release of the Python programming language, and it contains many new features and optimizations.
Major new features of the 3.10 series, compared to 3.9
Among the new major new features and changes so far:
- PEP 623 — Deprecate and prepare for the removal of the wstr member in PyUnicodeObject.
- PEP 604 — Allow writing union types as X | Y
- PEP 612 — Parameter Specification Variables
- PEP 626 — Precise line numbers for debugging and other tools.
- PEP 618 — Add Optional Length-Checking To zip.
- bpo-12782: Parenthesized context managers are now officially allowed.
- PEP 632 — Deprecate distutils module.
- PEP 613 — Explicit Type Aliases
- PEP 634 — Structural Pattern Matching: Specification
- PEP 635 — Structural Pattern Matching: Motivation and Rationale
- PEP 636 — Structural Pattern Matching: Tutorial
- PEP 644 — Require OpenSSL 1.1.1 or newer
- PEP 624 — Remove Py_UNICODE encoder APIs
- PEP 597 — Add optional EncodingWarning
bpo-38605:  (PEP 563) used to be on this list
in previous pre-releases but it has been postponed to Python 3.11 due to some compatibility concerns. You can read the Steering Council communication about it here to learn more.
bpo-44828: A change in the newly released macOS 12 Monterey caused file open and save windows in and other applications to be unusable. As of 2021-11-03, the macOS 64-bit universal2 installer file for this release was updated to include a fix in the third-party library for this problem. All other files are unchanged from the original 3.10.0 installer. If you have already installed 3.10.0 from here and encounter this problem on macOS 12 Monterey, download and run the updated installer linked below.
More resources
- Online Documentation
- PEP 619, 3.10 Release Schedule
- Report bugs at https://bugs.python.org.
- Help fund Python and its community.
And now for something completely different
For a Schwarzschild black hole (a black hole with no rotation or electromagnetic charge), given a free fall particle starting at the event
horizon, the maximum propper time (which happens when it falls without angular velocity) it will experience to fall into the singularity
is  (in natural units), where M is the mass of the black hole. For Sagittarius A* (the
black hole at the centre of the milky way) this time is approximately 1 minute.
Schwarzschild black holes are also unique because they have a space-like singularity at their core, which means that the singularity doesn’t happen at a specific point in space but happens at a specific point in time (the future). This means once you are inside the event horizon you cannot point with your finger towards the direction the singularity is located because the singularity happens in your future: no matter where you move, you will «fall» into it.
| Version | Operating System | Description | MD5 Sum | File Size | GPG | 
|---|---|---|---|---|---|
| Gzipped source tarball | Source release | 729e36388ae9a832b01cf9138921b383 | 25007016 | SIG | |
| XZ compressed source tarball | Source release | 3e7035d272680f80e3ce4e8eb492d580 | 18726176 | SIG | |
| macOS 64-bit universal2 installer | macOS | for macOS 10.9 and later (updated for macOS 12 Monterey) | 8575cc983035ea2f0414e25ce0289ab8 | 39735213 | SIG | 
| Windows embeddable package (32-bit) | Windows | dc9d1abc644dd78f5e48edae38c7bc6b | 7521592 | SIG | |
| Windows embeddable package (64-bit) | Windows | 340408540eeff359d5eaf93139ab90fd | 8474319 | SIG | |
| Windows help file | Windows | 9d7b80c1c23cfb2cecd63ac4fac9766e | 9559706 | SIG | |
| Windows installer (32-bit) | Windows | 133aa48145032e341ad2a000cd3bff50 | 27194856 | SIG | |
| Windows installer (64-bit) | Windows | Recommended | c3917c08a7fe85db7203da6dcaa99a70 | 28315928 | SIG | 
Relationship to other Python modules¶
Comparison with
Python has a more primitive serialization module called , but in
general  should always be the preferred way to serialize Python
objects.   exists primarily to support Python’s
files.
The module differs from in several significant ways:
- 
The module keeps track of the objects it has already serialized, 
 so that later references to the same object won’t be serialized again.
 doesn’t do this.This has implications both for recursive objects and object sharing. Recursive 
 objects are objects that contain references to themselves. These are not
 handled by marshal, and in fact, attempting to marshal recursive objects will
 crash your Python interpreter. Object sharing happens when there are multiple
 references to the same object in different places in the object hierarchy being
 serialized. stores such objects only once, and ensures that all
 other references point to the master copy. Shared objects remain shared, which
 can be very important for mutable objects.
- 
cannot be used to serialize user-defined classes and their 
 instances. can save and restore class instances transparently,
 however the class definition must be importable and live in the same module as
 when the object was stored.
- 
The serialization format is not guaranteed to be portable 
 across Python versions. Because its primary job in life is to support
 files, the Python implementers reserve the right to change the
 serialization format in non-backwards compatible ways should the need arise.
 The serialization format is guaranteed to be backwards compatible
 across Python releases provided a compatible pickle protocol is chosen and
 pickling and unpickling code deals with Python 2 to Python 3 type differences
 if your data is crossing that unique breaking change language boundary.
Functions for sequences¶
- (seq)
- 
Return a random element from the non-empty sequence seq. If seq is empty, 
 raises .
- (population, weights=None, *, cum_weights=None, k=1)
- 
Return a k sized list of elements chosen from the population with replacement. 
 If the population is empty, raises .If a weights sequence is specified, selections are made according to the 
 relative weights. Alternatively, if a cum_weights sequence is given, the
 selections are made according to the cumulative weights (perhaps computed
 using ). For example, the relative weights
 are equivalent to the cumulative weights
 . Internally, the relative weights are converted to
 cumulative weights before making selections, so supplying the cumulative
 weights saves work.If neither weights nor cum_weights are specified, selections are made 
 with equal probability. If a weights sequence is supplied, it must be
 the same length as the population sequence. It is a
 to specify both weights and cum_weights.The weights or cum_weights can use any numeric type that interoperates 
 with the values returned by (that includes
 integers, floats, and fractions but excludes decimals). Weights are assumed
 to be non-negative and finite. A is raised if all
 weights are zero.For a given seed, the function with equal weighting 
 typically produces a different sequence than repeated calls to
 . The algorithm used by uses floating
 point arithmetic for internal consistency and speed. The algorithm used
 by defaults to integer arithmetic with repeated selections
 to avoid small biases from round-off error.New in version 3.6. Changed in version 3.9: Raises a if all weights are zero. 
- (x, random)
- 
Shuffle the sequence x in place. The optional argument random is a 0-argument function returning a random 
 float in [0.0, 1.0); by default, this is the function .To shuffle an immutable sequence and return a new shuffled list, use 
 instead.Note that even for small , the total number of permutations of x 
 can quickly grow larger than the period of most random number generators.
 This implies that most permutations of a long sequence can never be
 generated. For example, a sequence of length 2080 is the largest that
 can fit within the period of the Mersenne Twister random number generator.Deprecated since version 3.9, will be removed in version 3.11: The optional parameter random. 
Search and Replace
One of the most important re methods that use regular expressions is sub.
Syntax
re.sub(pattern, repl, string, max=0)
This method replaces all occurrences of the RE pattern in string with repl, substituting all occurrences unless max is provided. This method returns modified string.
Example
#!/usr/bin/python3
import re
phone = "2004-959-559 # This is Phone Number"
# Delete Python-style comments
num = re.sub(r'#.*$', "", phone)
print ("Phone Num : ", num)
# Remove anything other than digits
num = re.sub(r'\D', "", phone)    
print ("Phone Num : ", num)
When the above code is executed, it produces the following result −
Phone Num : 2004-959-559 Phone Num : 2004959559
 
							 
							 
							 
							