Talk:Regular expression

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Bad writing style in Unicode>Normalization[edit]

The paragraph waits until the end to tell what normalization is, when it should be put in easy words at the beginning of the text.

I tried to understand the meaning of Normalization in the context of RegEx. But when I read the paragraph at Regular_expression#Unicode at the point Normalization it was telling me about Unicode and some Typewriter history just to end with the final words [...] is normalization.

Better writing style would be: Normalization means something something. And then go into examples and history lessons. GavriilaDmitriev (talk • they/them) 03:10, 24 April 2023 (UTC)[reply]

I need a regular expression for this[edit]

15. 12. 1983 this is the original one, but i want like this 15.12.1983 in AWB advanced setings in find box iam putting this one (\d{1,2}.\s\d{1,2}.\s\d{4}) and in replace putting this one(\d{1,2}.\d{1,2}.\d{4}) but It's not working.--Tmamatha (talk) 07:11, 22 June 2023 (UTC)[reply]

Please ask perhaps somewhere linked from WP:AWB. Or, try WP:VPT. Johnuniq (talk) 07:22, 22 June 2023 (UTC)[reply]
@Tmamatha: . is a metacharacter, so needs to be escaped with a \; try (\d{1,2}\.)\s+(\d{1,2}\.)\s+(\d{4})
Try [1] for a regular expression tester with explanation. Bazza (talk) 08:54, 22 June 2023 (UTC)[reply]

Replace first thumbnail image with a non religious image and with non ECMAScript expression[edit]

For accessibility and WP:NPOV, perhaps "The quick brown fox jumps over the lazy dog" with the pattern [aeiou]+. Looks like the current image was taken from https://regexr.com/.

If I understand correctly the current /h[aeiou]+/g in the thumbnail is an ECMAScript convention1 but doesn't mention so, hence also in combination would drop the prefix / and suffix /g.

1 https://262.ecma-international.org/5.1/#sec-7.8.5 31.20.106.40 (talk) 11:47, 10 October 2023 (UTC)[reply]

Not just ECMAScript uses the slash regex syntax. Perl, which played a key role in the growth of more complex features, uses this /expression/flag thing; that in turn evolved from ed's /expression/ syntax. As for religiousness, I don't care much about it. I am concerned that the new [aeiou]+ pattern is too simple, however. (Try a longer, neutral text: The Universal Declaration of Human Rights looks good against h[aeiou]+.) Artoria2e5 🌉 12:20, 8 February 2024 (UTC)[reply]
I created the current image in 2022 without giving it much thought, mostly just intending to give a flavour of the complexity of regular expressions. The previous illustration was File:The river effect in justified text.jpg which looked more like a regular text search for a double space. It also had the complex example code of (?<=\.) {2,}(?=[A-Z]), I assume because the image came first (it's from the sentence spacing article) and the regexp was written to fit.
I think the lead image example just needs to be simple enough that somebody learning about regular expressions for the first time would quickly understand the concept and be able to more or less see what the regexp search term meant - and also complex enough that the same reader could, in the highlighted output, see the power it had above a regular text search.
A straight [aeiou]+ does seem too simple, as in practice (assuming that we're keeping things simple and only using a single highlight colour) the output would be the same as for [aeiou]. Belbury (talk) 17:18, 26 February 2024 (UTC)[reply]
I'd prefer a non-religious text, too. What about the US declaration of independence (since the EN wiki server resides in the US)? Or some famous text of (e.g.) Shakespeare? The search pattern /h[aeiou]+/g seems fine for any of these. - Jochen Burghardt (talk) 17:12, 27 February 2024 (UTC)[reply]
 Done: Since nobody objected, I've implemented my suggestion, using the start of Antony's burial speech in Julius Caesar by Shakespeare. I changed the pattern to /r[aeiouy]+/ to get a more interesting image; if considering "y" as a vowel is a problem, let me know; I can remove it from the pattern. - Jochen Burghardt (talk) 19:12, 12 March 2024 (UTC)[reply]
@Jochen Burghardt: Good move, thanks. Some readers may find the inclusion of "y" a bit odd. Amending the description might be better than removing the letter, though, so perhaps
Shaded text shows the match results of the regular expression pattern /r[aeiouy]+/g which finds all occurrences of the letter r followed by one or more vowels or the letter y.
(That also takes care of MOS:COLOR.) Bazza 7 (talk) 13:35, 13 March 2024 (UTC)[reply]
Good call, thanks for taking the time to find a quote. Although there is a small issue here in the fact that Romans isn't highlighted in the example. Not sure if it would be better to update the image and include an /i option in the caption, or update the caption to a lower case r followed by one or more lower-case ... Belbury (talk) 13:50, 13 March 2024 (UTC)[reply]
Thanks for the corrections. I now omitted the "y" in the picture, in order to keep the informal description short (just "vowel"). Moreover, I changed "letter r" to "lower case r" in the description, in order not to presuppose too much knowledge about search options like /g (which seems unavoidable) and /i. An alternative could be /[Rr][aeiou]/g, which is unnecessary complicated, however (exemplifying [] just once is sufficient). - Jochen Burghardt (talk) 14:47, 14 March 2024 (UTC)[reply]

Perhaps an error[edit]

The text on the main page says this: "Every regular expression can be written solely in terms of the Kleene star and set unions over finite words." I think concatenation is also needed; if you have only Kleene star and unions over finite sets of words, you cannot make {1} conc {0}* (sets of words starting with 1 followed by arbitrarily many zeroes). 137.132.217.132 (talk) 09:07, 12 March 2024 (UTC)[reply]