Module talk:Excerpt

Module:Excerpt is permanently protected from editing because it is a heavily used or highly visible module. Substantial changes should first be proposed and discussed here on this page. If the proposal is uncontroversial or has been discussed and is supported by consensus, editors may use {{edit template-protected}} to notify an administrator or template editor to make the requested edit.

This is the talk page for discussing improvements to the Excerpt module.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Archives: 1, 2, 3, 4: 5 months

To help centralize discussions and keep related topics together, the talk pages for all the transclude excerpt templates redirect here (as of 15 May 2020 UTC):

|paragraphs=1 is grabbing paragraph 2 instead of paragraph 1

{{Excerpt|List of executive orders in the first presidency of Donald Trump|paragraphs=1|hat=no}}

I would like to grab

United States presidents issue executive orders (in addition to other executive actions) to help officers and agencies of the executive branch manage the operations within the federal government itself.

not

Donald Trump signed a total of 220 executive orders during his first term, from January 2017 to January 2021. As of January 2025, 72 of them (33%) have been revoked, many by his successor, Joe Biden.

Is this a bug in the module? Can it be fixed? Thanks. –Novem Linguae (talk) 00:22, 24 January 2025 (UTC)[reply]

It seems excerpt has seen and focused on the <onlyinclude> tag, causing it to center on the 2nd paragraph (which happens to be the onlyinclude content). I don't believe there's currently an option in the module to just ignore the tag. Aidan9382 _(talk) 09:48, 24 January 2025 (UTC)[reply]

Not sure what the best approach is here. I can see one possible approach that is documentation-only, thus turning the bug into a feature-with-explanation. I.e., if 'paragraphs=N' is described in the doc as "transcludable paragraph N", where a "transcludable paragraph" is one that is not excluded by source directives, Html comments, and the like, then we are done. A reasonable place to describe this would be in a new paragraph at existing section § Refinement using inclusion control. Thoughts? @Aidan9382 and Novem Linguae:. Mathglot (talk) 20:53, 1 July 2025 (UTC)[reply]

To be honest, I've always found it weird that excerpt even acknowledges onlyinclude in the first place. I've always seen onlyinclude as a rather aggressive (but effective) method of making a simple transclusion take specific content without too much work, but excerpt isn't a direct transclusion - it's a smart transclusion of the page that shouldn't need guiding by elements like onlyinclude, since it's given specific content to find on use already. I'd rather it be ignored by default, but that might be a bit of a breaking change (onlyinclude is uncommon but not sparse so examples are rare) and I'm unsure if there's other perspectives to this, since it's not something I've brought up in conversation before. Aidan9382 _(talk) 21:35, 1 July 2025 (UTC)[reply]

The main reason why excerpts respect onlyinclude tags is historical: it was already there when I arrived. I think it would be ok to remove that "feature", especially because it would help avoid use of onlyinclude tags in articles, which seems desirable to me. However, it'd be nice to have a list of excerpts that rely on onlyinclude tags before proceeding, so that we can fix them. The only way I can think of right now of generating such a list would be by editing Module:Excerpt to detect and categorize offending excerpts. @Aidan9382 Can you think of a simpler way? Thanks! Sophivorus (talk) 17:39, 3 October 2025 (UTC)[reply]

The only way to do it without making a module change would be to find all articles using onlyinclude via an insource search and then checking every use of excerpt for those articles, and I imagine that's gonna be impractical as there's a lot that meet both criteria. A module change shouldn't be a major issue, just make sure the category is made beforehand so it isn't a redcat, and maybe ship it at the same time as another fix if you have other fixes coming up to avoid spamming page reevaluations. Aidan9382 _(talk) 19:24, 3 October 2025 (UTC)[reply]

WikitextParser

@Aidan9382 @Certes Hi, I'm back, I hope you're doing well ! ^^ Today I restored the version of Module:Transcluder/sandbox that uses Module:WikitextParser, merged the latest changes, and fixed a few bugs. All test cases are looking good now (except one at Template:Excerpt/testcases3 but it's actually looking better in the sandbox version).

Why am I doing this? Basically, I'd like to sharpen the separation between all the generic, reusable functionality (WikitextParser), and all the arbitrary, ad-hoc functionality (Excerpt, Transclude lead excerpt, etc). Transcluder is caught somewhere in the middle. To accomplish this, I'd like to:

Move to production the new version of Transcluder that uses WikitextParser
Update Module:Excerpt, Module:Excerpt/portals and the few other modules that use Transcluder to use WikitextParser instead
Deprecate Transcluder

I think this sharper separation of concerns will make development, maintenance and documentation much easier, and may lead to a wider reuse of the generic functionality than is currently the case. Do you agree? Kind regards, Sophivorus (talk) 15:11, 18 March 2025 (UTC)[reply]

I've not looked at the code or other details but that sounds like a very good idea in principle. My original version, which had much less functionality, never aimed to be a generic wikitext parser. To make it quick to run (and write), I made several assumptions that are almost always true in practice but could break in theory. If you now have a parser that works in more general cases, that is a splendid achievement which will be useful elsewhere in Wikipedia and should certainly be made available separately. Certes (talk) 16:20, 18 March 2025 (UTC)[reply]

Glad you like the idea! After some more progress with Transcluder, I figured out it doesn't make much sense to rewrite Transcluder to use WikitextParser, only to then rewrite Excerpt to skip Transcluder. So I went ahead and started rewriting Excerpt directly. I just published the first MAJOR set of changes to Module:Excerpt/sandbox. There's still a long way to go, but I had to pause at some point. This will also be an opportunity to improve the architecture overall. Looking forward to continuing work next week! Sophivorus (talk) 14:04, 21 March 2025 (UTC)[reply]

Hi again! I'm almost done with replacing Module:Transcluder with Module:WikitextParser. All test cases look ok to me. I'll leave Module:Excerpt/sandbox for a few days in case anyone wants to review it. In the meantime I'll continue debugging. Then, if all goes well, I'll move the new version to production. Cheers! Sophivorus (talk) 13:36, 24 March 2025 (UTC)[reply]

@Certes @Aidan9382 Hi guys, today I did another run of tests and fixes, this time using real-world articles as well as test cases. We're getting closer to deployment. Next time I'll do some more testing and debugging, and if nothing big comes up, I may deploy. However, I'm a bit weary about performance and I'm not sure how to test or improve it. Also, honestly I'm not very sure about the difference between mw.ustring and string, so I'm not sure when to use each and what bugs this may lead to, especially in wikis other than enwiki. If you find a minute to look into any of this, I'll appreciate it greatly. Thanks! Sophivorus (talk) 14:19, 8 May 2025 (UTC)[reply]

string can be faster but only handles single-byte ASCII characters 0–255. ustring works on UTF-8 encoded strings and handles all Unicode characters, so is safer if accented letters etc. may appear. I haven't been following the development more generally since I retired. Certes (talk) 14:27, 8 May 2025 (UTC)[reply]

I've worked on performance quite a bit for other modules, so I could take a look if you have anything of specific interest (or just want me to generally check it over). And yes, the difference between ustring and string is unicode support, but if you're extracting text between always-ASCII characters and reusing that, using regular string should be fine (in-practice testing is the best way to find out - try having cases with non-english text and seeing how they behave). Aidan9382 _(talk) 18:41, 8 May 2025 (UTC)[reply]

@Certes Hi! I just read your user page and it's so sad you decided to leave for the reasons you did. Sad, but understandable, as I can partly agree with your assessment of the WMF. That being said, let me just THANK YOU for all the work you've done over so many years. As I often tried to express during our talks, I was always surprised and thankful for your sharp eye. This module (among many other things, I'm sure) wouldn't be the same or wouldn't even be possible without you. So farewell my friend, and I hope you find other more worthy projects where to pour your time, knowledge, skill and passion!

@Aidan9382 Yes, please! After some testing, I finally understood the difference between string and ustring, but I'm afraid that means I'll have to review all the methods and test cases, especially at WikitextParser, with this new knowledge (*sigh*). In the meantime, if you feel like doing that performance review, I'll be happy to implement your recommendations, if you don't do it yourself. Thank you!! Sophivorus (talk) 14:27, 9 May 2025 (UTC)[reply]

@Sophivorus: I've gone ahead and ran through both Module:Excerpt/sandbox and Module:WikitextParser and given them some performance improvements (thanks old me for making the utility to actually be able to do this), and it now actually runs faster than the current live version by a small margin (~1.2x, and about 2x faster than what the sandbox used to be), at least in the test case I'm using. Most of the remaining computation time is just handling the template blacklist (~50% of all execution time on its own), but it's at least far less than it was before.

I don't think any of the changes I've done would've broken functionality, but I'd look them over yourself and test just to be sure, since some of the changes are subtle and there might be a case I'm not thinking of. Aidan9382 _(talk) 22:20, 9 May 2025 (UTC)[reply]

@Aidan9382 Thanks so much! I'll review, test and learn from your changes sometime this week, and hopefully deploy too. Cheers! Sophivorus (talk) 12:04, 12 May 2025 (UTC)[reply]

@Aidan9382, Certes: Well, after much development, testing and a long wait, I finally came around and updated Module:Excerpt to the latest version that uses Module:WikitextParser. I think this is a big step forward, but I'm sure we'll soon hear of some bug or other, hopefully not very serious. I'll keep watch for the next few hours and would appreciate any help. Thanks!! Sophivorus (talk) 13:26, 2 October 2025 (UTC)[reply]

Misbehaviour when excerpting a redirect to a section

I discovered a bit of odd behaviour this evening, while pottering through Existence of God. Its section on the argument from inconsistent revelations was actually a transclusion of the lead from Religious pluralism.

A monkey patch fixes the immediate problem, but this only treats a symptom of an underlying bug in the module. section, the variable used to identify the section of an article to transclude, is defined purely in terms of the argument passed to the template. This definition fails to account for the possibility that an article has been merged into another article, as a section of the latter article, as is the case with Argument from inconsistent revelations.

This is quite the oversight, but it seems like a pretty quick fix. I'd make an edit request if I knew (or wanted to know) the code better. EnronEvolved^{My Talk Page} 21:13, 26 May 2025 (UTC)[reply]

I've pushed this edit to the sandbox which should resolve that without affecting normal cases. The new behaviour from that is, if no fragment or section is specified in the call to excerpt and the target page is a redirect with a fragment, excerpt will use the fragment provided. If the redirect has no section or a fragment was specified in the template usage, it'll be unchanged. I would imagine this behaviour makes more sense intuitively than the module always ignoring the fragment. Aidan9382 _(talk) 21:25, 26 May 2025 (UTC)[reply]

I can confirm that excerpting Argument from inconsistent revelations using the sandbox version of the template transcludes the right piece of the article. When is this change likely to be applied to the main template? EnronEvolved^{My Talk Page} 19:18, 27 May 2025 (UTC)[reply]

Hatnote: singular vs plural

The hatnote generated by {{Excerpt|Planet|paragraphs=1|only=paragraphs}} is:

These paragraphs are an excerpt from Planet.

Since only one paragraph is requested, the hatnote should be:

This paragraph is an excerpt from Planet.

Special:Permalink/1283596247#Planets_and_exoplanets. Jruderman (talk) 09:39, 24 June 2025 (UTC)[reply]

@Jruderman This can be achieved by setting only=paragraph instead of only=paragraphs (and when doing so, specifying paragraphs=1 becomes unnecessary, see this diff). That being said, it may still be desirable to tweak the hatnote when only paragraphs=1 is set. Sophivorus (talk) 12:24, 25 June 2025 (UTC)[reply]

What am I doing wrong while trying to exclude a template?

My code is as follows:

{{Excerpt|List of Falcon 9 and Falcon Heavy launches|Booster landings|hat=no|templates=-col-float-end}}

What I see at the end of the expanded code is

</div><div class="multicol-float-clear " style="" ></div>

The above is the content in {{col-float-end}}, which appears at the end of the excerpt and which I am attempting to exclude, following the template instructions. What am I doing wrong? The article in question is Landing Zones 1 and 2, and two others. – Jonesey95 (talk) 16:15, 2 October 2025 (UTC)[reply]

I don't think you've done anything wrong here, I think this is a consequence of a pretty major overhaul done recently (see #WikitextParser above), since it works as expected on the previous version in testing. Aidan9382 _(talk) 18:24, 2 October 2025 (UTC)[reply]

This has been fixed with this edit, and your use case should now work as expectted. Aidan9382 _(talk) 18:33, 2 October 2025 (UTC)[reply]

Thanks for the quick fix! – Jonesey95 (talk) 18:51, 2 October 2025 (UTC)[reply]

Bug report: poss interference from escaped pipe earlier on page

Not sure what's going on here, but I am looking to grab the lead paragraph from Boing Boing. We have this broken example:

1. broken: {{excerpt|Boing Boing|paragraphs=1|only=paragraphs|hat=no|references=no|inline=yes}}

Boing Boing is a website, first established as a zine in 1988, later becoming a group blog. Common topics and themes include technology, futurism, science fiction, gadgets, intellectual property, Disney, and left-wing politics. It twice won the Bloggies for Weblog of the Year, in 2004 and 2005. The editors are Mark Frauenfelder, David Pescovitz, Carla Sinclair, and Rob Beschizza, and the publisher is Jason Weisberger.

but this workaround is okay (note the paragraphs=1,2):

1. works: {{excerpt|Boing Boing|paragraphs=1,2|only=paragraphs|hat=no|references=no|inline=yes}} (but why?)

Boing Boing is a website, first established as a zine in 1988, later becoming a group blog. Common topics and themes include technology, futurism, science fiction, gadgets, intellectual property, Disney, and left-wing politics. It twice won the Bloggies for Weblog of the Year, in 2004 and 2005. The editors are Mark Frauenfelder, David Pescovitz, Carla Sinclair, and Rob Beschizza, and the publisher is Jason Weisberger.

One report named Boing Boing as the most popular blog in the world until 2006, when Chinese-language blogs became popular; it remained among the most widely linked and cited blogs into the 2010s.

Suspecting {{!}} in the {{for multi}}, but maybe something in the infobox? Mathglot (talk) 21:05, 2 October 2025 (UTC)[reply]

The following example seems related, in that the paragraphs=1,2 trick seems to improve matters, but in this case, that only resolves part of the problem, but not all of it. Here we are trying to excerpt the lead paragraph of Breitbart News:

2. empty: {{excerpt|Breitbart News|paragraphs=1|only=paragraphs|hat=no|references=no|inline=yes}}

Breitbart News Network (/ˈbraɪtbɑːrt/; known commonly as Breitbart News, Breitbart, or Breitbart.com) is an American far-right<ref name="FarRight">Multiple sources:

whereas this workaround with paragraphs=1,2 helps, but breaks differently than the example above:

2. semi-broken: {{excerpt |Breitbart News |paragraphs=1,2 |only=paragraphs |hat=no |references=no |inline=yes}}

Breitbart News Network (/ˈbraɪtbɑːrt/; known commonly as Breitbart News, Breitbart, or Breitbart.com) is an American far-right<ref name="FarRight">Multiple sources:

}} The site has published a number of conspiracy theories<ref name="ConspiracyTheories">Multiple sources:

Another example:

3. dumps infobox code: {{excerpt|Bustle (magazine)|paragraphs=1|only=paragraphs|references=no}}

Bustle is an online American women's magazine founded in August 2013 by Bryan Goldberg. It positions news and politics alongside articles about beauty, celebrities, and fashion trends. By September 2016, the website had 50 million monthly readers.

This time, paragraphs=3 is the workaround to grab the first paragraph:

3. grabs paragraph #1: {{excerpt|Bustle (magazine)|paragraphs=3|only=paragraphs|references=no}}

Hope this helps isolate the problem. Mathglot (talk) 21:30, 2 October 2025 (UTC)[reply]

Whatever this turns out to be, please add these examples to the testcases page so that we don't make the same mistake again. We'll make different mistakes instead, hooray! – Jonesey95 (talk) 21:35, 2 October 2025 (UTC)[reply]

Btw, I tried various other workarounds with examples 1 and 2 above, such as templates=no and templates=-Infobox website, but it didn't help in either case. Mathglot (talk) 21:45, 2 October 2025 (UTC)[reply]

Interestingly, I'm testing these with the previous version, and if anything, it's performing worse, so these might not be new problems. Aidan9382 _(talk) 21:51, 2 October 2025 (UTC)[reply]

That's actually a piece of good news, as it is very annoying for a developer when a fix for one thing introduces some other glitch. Let's be happy that wasn't the case here! Mathglot (talk) 22:11, 2 October 2025 (UTC)[reply]

If you are trying to evaluate importance, this is far from urgent and I can live with it for the foreseeable future. If there are other things you would prefer to attend to, please do. Thanks for the quick response, nevertheless. Mathglot (talk) 22:18, 2 October 2025 (UTC)[reply]

Added example 3 above. There is something funny going on in paragraph enumeration or exclusion of other item types, as this time, paragraphs=3 is the workaround to grab the first paragraph. Mathglot (talk) 23:21, 2 October 2025 (UTC)[reply]

As far as I can tell, the exact issue here is two different problems, both to do with excerpt trying to determine when the first paragraph starts:

A line that isn't a single isolated template (e.g. }}{{Conservatism US|media}} from Breitbart News or {{Use mdy dates|date=August 2024}}{{Infobox magazine from Bustle (magazine)) doesn't meet the strict criteria of the template removal regex (\n%b{} *\n from here), which means the templates just won't end up being removed (I have not created a fix for this one yet - either the regex needs to be weaker or the solution needs to be more than just a singular simple regex)
Excerpt was removing lists from the text before removing templates when trying to find where paragraphs begin, allowing for the ending of a template to be erroneously removed (e.g. *2000 (blog)}} from Boing Boing), which lead to bad capturing. I've theorised a fix in Special:Diff/1314729974, but I'm unsure if this could cause different problems, though I suspect not

@Sophivorus: thoughts? Aidan9382 _(talk) 23:45, 2 October 2025 (UTC)[reply]

Aidan, I am developing some PCRE regexes offline for a different Wikipedia project, and for me, the key to success there was to use patterns for turning dotall on and off inside the regex (diff) as opposed to using the m flag; this allowed me to look at content across lines but only in the right place, but apparently Scribunto does not have that: mw:Extension:Scribunto/Lua reference manual/vi#Patterns: "Dot (.) always matches all characters, including newlines." I don't know if that is an issue generally in Lua modules, or in this one in particular. In my task, if I had had to use Scribunto, I think I would have had to break it up into multiple regexes. Mathglot (talk) 00:31, 3 October 2025 (UTC)[reply]

Lua's regex is inherently weaker than standard PCRE - flags like that just don't exist as an option, even in modern lua versions, and that's by design. Also, maybe I'm misunderstanding something, but I'm not too sure how this is relevant here, since the problematic regex in question doesn't rely on dot for how it works.

As for fixing the bug, it's just a matter of making something that behaviourally fits what it's meant to do (removing header templates). The problem is that there can be a decent amount of minimal variation to how that looks (see the examples above) that makes it hard to do in a regex alone. It shouldn't be too hard of a fix (a weaker regex with some checking in a function could do the job, or maybe something a little more involved depending on testing), I just haven't gotten around to trying implementations yet Aidan9382 _(talk) 09:18, 3 October 2025 (UTC)[reply]

@Mathglot Boing Boing case should be fixed now.

@Jonesey95 Agree, I just added a couple test cases (diff).

@Aidan9382 I just deployed your fix for the Boing Boing case, because I instinctively thought of the same solution and I also suspect it won't introduce similar issues. As for the other cases, weakening the regex will probably cause new problems with paragraphs that either start or end with a template. I guess one solution could be this one, but perhaps you can come up with something more elegant? See the test cases here. Notice the new issue with the <ref> tag. The reported case doesn't suffer from it because it includes references=no, so I think we can ignore it for now, but just for the record, my other wikitext parser has a replaceElements and restoreElements method that could help. Cheers! Sophivorus (talk) 13:06, 3 October 2025 (UTC)[reply]

BBC anomalies

I'm quite sure this worked before, but now excerpting BBC paragraph 1 is broken, but it manifests differently depending on context. For starters, this collapse bar appears to alter it as well:

BBC anomalies

{{fake header|level=3|One}}
{{excerpt|BBC|paragraphs=1|only=paragraphs|hat=no|references=no|inline=yes}}

{{fake header|level=3|Two}}
{{excerpt|BBC|paragraphs=2|only=paragraphs|hat=no|references=no|inline=yes}}

{{fake header|level=3|Three}}
{{excerpt|BBC|paragraphs=3|only=paragraphs|hat=no|references=no|inline=yes}}

{{fake header|level=3|Four}}
{{excerpt|BBC|paragraphs=4|only=paragraphs|hat=no|references=no|inline=yes}}

{{fake header|level=3|Five}}
{{excerpt|BBC|paragraphs=5|only=paragraphs|hat=no|references=no|inline=yes}}

Displays sections One and Five at ExpandTemplates, but something really screwy on this page:

One

The British Broadcasting Corporation (BBC) is a British public-service broadcaster headquartered at Broadcasting House in London, England. Originally established in 1922 as the British Broadcasting Company, it evolved into its current state with its current name on New Year's Day 1927. The oldest and largest local and global broadcaster by stature and by number of employees, the BBC employs over 21,000 staff in total, of whom approximately 17,200 are in public-sector broadcasting.

Two

The BBC was established under a royal charter, and operates under an agreement with the Secretary of State for Culture, Media and Sport. Its work is funded principally by an annual television licence fee which is charged to all British households, companies, and organisations using any type of equipment to receive or record live television broadcasts or to use the BBC's streaming service, iPlayer. The fee is set by the British government, agreed by Parliament, and is used to fund the BBC's radio, TV, and online services covering the nations and regions of the UK. Since 1 April 2014, it has also funded the BBC World Service (launched in 1932 as the BBC Empire Service), which broadcasts in 28 languages and provides comprehensive TV, radio, and online services in Arabic and Persian.

Three

Some of the BBC's revenue comes from its commercial subsidiary BBC Studios (formerly BBC Worldwide), which sells BBC programmes and services internationally and also distributes the BBC's international 24-hour English-language news services BBC News, and from BBC.com, provided by BBC Global News Ltd. In 2009, the company was awarded the Queen's Award for Enterprise in recognition of its international achievements in business.

Four

Since its formation in 1922, the BBC has played a prominent role in British life and culture. It is sometimes informally referred to as the Beeb or Auntie. In 1923 it launched Radio Times (subtitled "The official organ of the BBC"), the first broadcast listings magazine; the 1988 Christmas edition sold 11 million copies, the biggest-selling edition of any British magazine in history.

Five

so also have a look at

view Wikipedia:Reliable sources/Perennial sources/all/BBC#Excerpt, where it displays italicized infobox wikicode, and no paragraph content;
try pasting the code between the pre tags in the collapsed code above into Special:ExpandTemplates, where it renders fake sections 'One' and 'Five' , and skips the others; section One gets paragraph 1 of the lead, and section Five incorrectly gets paragraph 2; it also renders the Infobox, which it shouldn't.

I wouldn't necessarily rush to back out any recent changes, as I looked at another 20 pages linked from WP:RSPDEMO, and BBC is the only one with a newly corrupt excerpt; all the others I looked at are still fine. Given that, I would rather just do a workaround to the BBC article itself to make it work under the current code, rather than lose any beneficial fixes recently introduced, but it's your call, as I don't know the internals. But I did want you to see this issue so you can analyze it, before I start changing BBC to work around it. One thing I notice there and might change, is {{BBC sidebar}} appended to a comment line; another is that the infobox has embedded templates. Thanks, Mathglot (talk) 23:18, 5 October 2025 (UTC)[reply]

where it renders fake sections 'One' and 'Five' , and skips the others - not quite. They are successfully rendering, but the issue is that paragraphs 1-4 (per the module) are actually just different sections of the infobox (e.g. paragraph 3 is }}\n| services = {{flat list|), with paragraph 4 also containing the first actual paragraph. It's once again an issue with the newline-seperated lists within the infobox making excerpt confused. Also, interestingly, once again, this seems like it might be an issue that existed in the old version of the module, since I seem to be getting the same behaviour.

@Sophivorus: what's the best way theoretically to fix this? The main two ideas that come to mind are either fleshing out the regexes better until these cases are all handled to doing something like Module:Excerpt/portals' parse (though probably a bit simpler than what is there). I've not yet had a deep look through the new wikitext module so I'm not familiar with the call flow just yet. Aidan9382 _(talk) 08:41, 6 October 2025 (UTC)[reply]

@Mathglot Should be fixed now, thanks for the report! I also added this case to the test cases (diff) so we can keep an eye on it.

@Aidan9382 The problem in this case was with a comment among the templates, which prevented the regexes from matching, so I fixed it by removing the comments preemptively (diff), which is desirable anyway. Honestly, I'm not sure what a better approach would be, theoretically, so for now I'd say we keep improving the regexes. Sophivorus (talk) 12:45, 6 October 2025 (UTC)[reply]

Thanks for the fix, and for adding the test case.

Regarding test cases: imho, at a minimum, we should tag the revision of the article which manifests a bug, as the page may change, and in fact, I would have rapidly changed to work around the bug had you not fixed it so quickly. I captioned the three new test cases with the rev number, as a reminder that running those test cases against future versions of BBC or the other ones may not be valid tests. But that is still somewhat error-prone procedurally (not to mention time-consuming); wouldn't it be better just to freeze a revision manifesting the bug? That would finesse both problems.

I have made an overture in that direction: see the last test case in Template:Excerpt/testcases2#Paragraphs, which references Template:Excerpt/testcases/data/BBC. This page is a rump copy of BBC rev. 1314645660, with enough content to manifest the bug, if it reappears. The presentation of the test case title in the collapse bar at testcases2 is not pretty, but if this looks good to y'all, I think we should modify {{Test case table}} to add new param |_data= which would identify either the the data repository (parent of BBC in this case) or the BBC data subpage itself and then do a better job of presenting the case so it's not so ugly as it is now.

Note to self: Probably I should subst buggy examples in my original report—as well as have an unsubsted companion, as we can no longer see the broken version since the fix. Alternatively, savvy reporters could simply create a data repository copy first, and then reference that in the bug report, for a briefer, surer report, and a testcase in the bag ahead of time, killing two birds. But I think the minimum bar is reporting the rev. of the failing case in the test case, however that gets accomplished. Would be interested in your thoughts, Aidan & Sophivorus. Mathglot (talk) 18:20, 6 October 2025 (UTC)[reply]

Problem report: got nil

List of United States tornadoes in 1950#February 11–13 events is showing "Lua error: bad argument #1 to 'lcfirst' (string expected, got nil)". The wikitext (referring to February 1950 tornado outbreak#Confirmed tornadoes) giving that is:

{{Excerpt|February 1950 tornado outbreak|Confirmed tornadoes|templates=.*|only=tables|tables=tornadoes}}

The reason for the error is that function Excerpt.filterTables has id = string.match and the resulting id is nil. That is at Module:Excerpt#L-239. Johnuniq (talk) 06:07, 6 October 2025 (UTC)[reply]

I've fixed this through this change in the sandbox to match old behaviour, however there seems to be a different issue with section handling that's causing no tables to actually be emitted unless |subsections=yes is specified (in the previous version this wasn't required for whatever reason) - @Sophivorus: is this an intentional functionality change? Aidan9382 _(talk) 08:56, 6 October 2025 (UTC)[reply]

@Johnuniq Should be fixed now, thanks for the report! You'll notice List of United States tornadoes in 1950#February 11–13 events now has three tables rather than one. That's because there are three tables with id "tornadoes". Perhaps Module:Excerpt should return only the first one, but also there shouldn't be three elements with the same id, so you might want to edit Tornado outbreak of February 11–13, 1950#Confirmed tornadoes to give different ids to each table. Cheers!

@Aidan9382 I deployed your change to matchFilter (diff), not sure why I removed the check for nil. Regarding the need for subsections=yes, I think that was actually a bug with the old version, because it makes more sense to first limit the excerpt to the desired section/subsections, and then filter the tables, files or whatever. Would you agree? So I ended up adding subsections=yes to List of United States tornadoes in 1950#February 11–13 events. Cheers! Sophivorus (talk) 13:18, 6 October 2025 (UTC)[reply]

That's fair, I was just somewhat worried about potentially existing cases like this that might be broken by it, though I suppose it's not hard to fix so I probably shouldn't be worrying about that. Aidan9382 _(talk) 13:50, 6 October 2025 (UTC)[reply]

Regression: Excerpt doesn't pick up discontiguous labeled sections with the same name

Ran into this trying to fix broken List of Seattle Sounders FC seasons excerpting in Seattle Sounders FC. For some reason the second part didn't get transcluded until it was renamed. The docs say using {{#lst:article|fragmentname}} [...] is equivalent to using the |fragment=fragmentname parameter with this template, which makes it sound like this shouldn't be a problem. I guess in this particular case {{Excerpt}} gives no added value compared to plain {{#section}} and maybe I should have just switched to that, but the question still stands. Gamapamani (talk) 16:43, 9 October 2025 (UTC)[reply]

@Gamapamani: If I understand correctly this is a limitation of HTML and wikitext. Section "IDs", which are how browsers and the MediaWiki parser identify sections, are just HTML id attributes; section "Foo bar" fundamentally exists in the HTML as the element with id="Foo_bar". That's why when you go to User:Example#Foo bar the browser will jump to the section named "Foo bar" if one exists.

HTML requires that each id be unique in a given document. Two sections with the same name produce two identical ids, violating that; the de facto behavior pretty much everything follows, is to just "see" the id that comes first on the page and ignore any duplcate ids later in the page. HTML doesn't have any way to say "which of the two elements with this id you mean" since HTML doesn't allow for duplicate IDs. So renaming sections to not be identically-named is the solution, as you found. --Slowking Man (talk) 05:42, 10 October 2025 (UTC)[reply]

Sure, but that doesn't seem to be the reason in this case (this parser tag, <section />, is incompatible with an HTML element, as explained here). I didn't look at the code yesterday, but it seems that the issue is that recently the module switched from using Module:Transcluder to using Module:WikitextParser (rated as alpha), and the code in the latter doesn't check for multiple occurrences, i.e. for Module:Transcluder

=p.getSection('<section begin=zzz />aaa<section end=zzz />nnn<section begin=zzz />bbb<section end=zzz />', 'zzz')

aaabbb

whereas for Module:WikitextParser

=p.getSectionTag('<section begin=zzz />aaa<section end=zzz />nnn<section begin=zzz />bbb<section end=zzz />', 'zzz')

aaa

Gamapamani (talk) 15:06, 10 October 2025 (UTC)[reply]

Problem: how to remove the [[Category:Articles with excerpts]] part being added

{{help needed}} As a directly inserted text. It breaks usage within other templates. Respublik (talk) 04:13, 10 October 2025 (UTC)[reply]

As-is, without modifying this module, you'd have to create a second new module, #invoke this module from the second, and then delete the tracking category from the module output.

It's possible you might want to do things differently instead of using this module. You can transclude sections "directly" without this module by feeding the wiki software the right magic incantation: look at Help:Transclusion. If you can give details on what specifically you're trying to do people might be able to give more precise advice. --Slowking Man (talk) 05:24, 10 October 2025 (UTC)[reply]

@Respublik

Done I just added a "trackingCategories" param (diff) that when set to "no", will skip automatic categorization. Out of curiosity, could you link to the other template(s) where you're trying to use Module:Excerpt? Sophivorus (talk) 12:24, 10 October 2025 (UTC)[reply]

@Respublik I just renamed the param to simply "track" to keep it in line with other parameters of the module. Sophivorus (talk) 12:44, 21 October 2025 (UTC)[reply]

Excerpt shows multiple paragraphs instead of single paragraph

The Alpha Centauri article used the excerpt module to show a paragraph from Stars in fiction about Alpha Centauri in fiction. This edit removed it without explanation, which brought to my attention that (as can be seen in previous revisions) the excerpt doesn't just pick up the paragraph about Alpha Centauri, but also the following paragraph about Tau Ceti, which isn't relevant. This didn't happen before; did something change in the excerpt module? SevenSpheres (talk) 18:36, 17 October 2025 (UTC)[reply]

@SevenSpheres Thanks for the report, we're on it!

@Aidan9382 Hi! I was able to simplify the issue to this (try {{Excerpt|User:Sophivorus/sandbox|Real stars|paragraphs=2}}) but still I couldn't figure it out. I'll try again later but I wanted to share my progress. Cheers! Sophivorus (talk) 11:29, 20 October 2025 (UTC)[reply]

@Sophivorus: if I'm looking at my test prints correctly, I suspect this is an issue with when the html comment is getting removed internally. When excerpt calls in to filterParagraphs, the input wikitext still contains c  e, but the parser's getParagraphs method returns the sanitised version (c e) at that index, causing removeString to not find any valid matches and therefore causing the paragraph to remain in place. Aidan9382 _(talk) 11:46, 20 October 2025 (UTC)[reply]

@SevenSpheres Issue should be fixed now. Feel free to re-add the excerpt if you want.

@Aidan9382 Good catch, that was it! I fixed it by improving getParagraphs with frontier patterns (diff). I think it's more readable, robust and maintainable now. Sophivorus (talk) 12:49, 21 October 2025 (UTC)[reply]

Excerpts of infoboxes that include Wikidata data

Hiya,

I am rewriting some pages and as part of it, I am excerpting the template Template:Infobox Russian federal subject. The template automatically pulls wikidata info from that page to get its time zone data, but this doesn't work on the page in which the template is being embedded in. For an example, see Administrative divisions of Kostroma Oblast. Is there a way I can work around this to include that data? Thanks, EatingCarBatteries _{(contributions, talk)} 20:57, 23 October 2025 (UTC)[reply]