Wikipedia:Request a query
This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.
You may also be interested in the following:
- If you are interested in writing SQL queries or helping out here, visit our tips page.
- If you need to obtain a list of pages that meet certain criteria, consider using PetScan (user manual) or the default search. PetScan can generate lists of articles in subcategories, articles which transclude some template, etc.
- If you need to make changes to a number of articles based on a particular query, you can post to the bot requests page, depending on how many changes are needed.
- For long-term review and checking, database reports are available.
The database replicas do not have access to page content, so queries which require checking wikitext cannot be answered with database queries. In particular, there is no way to query for template parameters or anything related to references. However, someone may be able to assist by querying in another way (for example, checking for external links rather than references) or suggest an alternative tool.
Admins using the block button
[edit]Can you tell me how many admins blocked anyone (registered or otherwise) during the last year? We have 827 admins, and I have heard that most of them use their admin tools only occasionally, so I would expect the number who use the block button specifically to be low – but maybe you'll surprise me. The total number (no need for individual names) is enough for me. WhatamIdoing (talk) 20:37, 6 October 2025 (UTC)
- @WhatamIdoing: According to this query, 414 of them made at least one block in the past year, out of which 41 made more than 1000 blocks each (~50% and ~5% of all admins, respectively). NguoiDungKhongDinhDanh 21:03, 6 October 2025 (UTC)
- quarry:query/97803 has totals for different uses of the block button too. Do you need me to filter accounts that currently aren't admins anymore? I can think of at least one where that's the case. —Cryptic 21:04, 6 October 2025 (UTC)
- No, that's good enough. Thanks! WhatamIdoing (talk) 21:35, 6 October 2025 (UTC)
Why is this query so slow?
[edit]This query of mine has been running for about 24 hours. Its content is as follow:
SELECT
CONCAT('Thảo luận Thành viên:', page.page_title) as 'Page name',
COUNT(revision.rev_id) as 'Revisions'
FROM
page
JOIN
revision ON revision.rev_page = page.page_id
JOIN
actor ON actor.actor_id = revision.rev_actor
WHERE
page.page_namespace = 3 AND
page.page_is_new = 0 AND
actor.actor_name = 'TuanminhBot' AND
is_ipv4(page_title)
GROUP BY
page.page_id
HAVING
COUNT(DISTINCT revision.rev_actor) = 1 AND
COUNT(revision.rev_id) > 1
ORDER BY
COUNT(revision.rev_id) DESC
I expect this to return something between 10,000 and 300,000 rows, give or take. An almost identical query, whose intention is to return only the amount of pages, has also been running for 21 hours. NguoiDungKhongDinhDanh 08:41, 7 October 2025 (UTC)
- The actor view was introduced in a 2019 downgrade. It joins the actual actor table, which is hidden, to eight other tables, which slows down many queries dramatically. Using the actor_revision view instead may undo some of the damage. Certes (talk) 10:49, 7 October 2025 (UTC)
- @Certes: quarry:query/97845 (
actor) took 105 seconds to return 100 rows, compared to quarry:query/97844 (actor_revision)'s 117 seconds. Am I using it wrong? NguoiDungKhongDinhDanh 11:23, 7 October 2025 (UTC)- The benefit in using actor_revision over actor is that, as Certes says, it simplifies the actually-executed query, which in some cases makes the optimizer choose a better query plan. Most of the time, it won't matter, and it probably won't here.I wrote a big long thing here about using revision_userindex instead of revision - that choice of view does make a difference, since it allows use of indices including rev_actor - but since TuanminhBot has a edit count of some 34.5 million, it's still more selective to look at page first. Query plan (either for revision or revision_userindex) looks like:
- @Certes: quarry:query/97845 (
query plan
|
|---|
+------+-------------+---------------+-------+------------------------------------------------------------------------------------+---------------------+---------+---------------------------+----------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+---------------+-------+------------------------------------------------------------------------------------+---------------------+---------+---------------------------+----------+------------------------------------+ | 1 | PRIMARY | actor | const | PRIMARY,actor_name | actor_name | 257 | const | 1 | Using temporary; Using filesort | | 1 | PRIMARY | page | index | PRIMARY,page_name_title | PRIMARY | 4 | NULL | 15611429 | Using where | | 1 | PRIMARY | revision | ref | rev_page_actor_timestamp,rev_page_timestamp | rev_page_timestamp | 4 | viwiki.page.page_id | 1 | Using where | | 14 | SUBQUERY | revision | ref | rev_actor_timestamp | rev_actor_timestamp | 8 | const | 33267992 | Using where | | 13 | SUBQUERY | logging | ref | log_actor_type_time,log_type_action,log_type_time,log_actor_time,log_actor_deleted | log_actor_type_time | 8 | const | 6444298 | Using index condition; Using where | | 12 | SUBQUERY | recentchanges | ref | rc_actor,rc_actor_deleted | rc_actor_deleted | 8 | const | 1 | Using where; Using index | | 11 | SUBQUERY | filearchive | ref | fa_actor_timestamp,fa_actor_deleted | fa_actor_deleted | 8 | const | 1 | Using where; Using index | | 10 | SUBQUERY | oldimage | ref | oi_actor_timestamp,oi_actor_deleted | oi_actor_deleted | 8 | const | 1 | Using where; Using index | | 9 | SUBQUERY | image | ref | img_actor_timestamp | img_actor_timestamp | 8 | const | 1 | Using index | | 8 | SUBQUERY | block | ref | block_actor_deleted | block_actor_deleted | 9 | const,const | 1 | Using index | | 7 | SUBQUERY | archive | ref | ar_actor_timestamp,ar_actor_deleted | ar_actor_deleted | 8 | const | 2576075 | Using where; Using index | | 6 | SUBQUERY | block_target | ref | PRIMARY,bt_user | bt_user | 5 | const | 1 | Using index | | 6 | SUBQUERY | block | ref | bl_target | bl_target | 4 | viwiki.block_target.bt_id | 1 | Using where | | 5 | SUBQUERY | user | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index | +------+-------------+---------------+-------+------------------------------------------------------------------------------------+---------------------+---------+---------------------------+----------+------------------------------------+ |
- where you can see that it had to fall back on a full table scan of page with an estimate of 15.6 million rows to look at. (All the entries with select_type "SUBQUERY" are for resolving the actor view; using actor_revision will cut them down to just a single entry, the one currently numbered 14.) Only thing I can think of to improve that is to add another condition on page_title so that it only looks at page titles starting with digits (since is_ipv4(page_title) isn't indexable) - adding
AND page.page_title BETWEEN '1' AND ':'to the end of your WHERE clause gets the estimate row count on page down to 7805216, about half.Which gets me to stop and think again, because 15 million isn't all that high to start with - I was thinking in terms of enwiki numbers, where it's around 58 million. Running SHOW ANALYZE for the amended query shows that it gets through all those rows in page reasonably quickly, then hangs. What I'm guessing is happening (and I'm running right up against the edge of my competence here, and am probably already over it) - is that either it's stuck counting and sorting those results (for the HAVING clause), or that there are a lot - lot - of results, enough to overwhelm output. Quarry can't really deal with more than 100k rows in a result, though that shouldn't affect me at toolserver.Next thing I'd try is to find a couple pages that you know will be output, for example "User talk:193.42.1.7", and restricting page_title to something like "page_title LIKE '193.42.1.%'" to see if you get results that way. I'm out of time to look at this; I can investigate some more tomorrow if what I've written so far doesn't help. —Cryptic 14:04, 7 October 2025 (UTC)- Also, this query won't do what the title says it does. The WHERE clause is dealt with well before HAVING, so informally, you're getting a bunch of pages, then all their revisions, then removing revisions not made by TuanminhBot, then checking to see if there was only one user among what's left; and that last part will always be true. —Cryptic 14:24, 7 October 2025 (UTC)
- @Cryptic: Thanks. I'm
{{User SQL-0.5}}, so I unfortunately don't understand everything you said. Here's what I could make of it:- Add
AND page.page_title BETWEEN '1' AND ':'to limit the number of indexed pages. - Try adding
page_title LIKE '193.42.1.%'and see if the query runs faster. - The
actor.actor_name = 'TuanminhBot'condition rendersHAVING COUNT(DISTINCT revision.rev_actor) = 1useless.
- Add
- I came up with quarry:query/97847 (~64,000 rows in ~50 seconds). That's much faster than before, but the pages returned are not guaranteed to be created by TuanminhBot. How do I factor that into the query? NguoiDungKhongDinhDanh 17:53, 7 October 2025 (UTC)
- That's all correct. The way you properly limit it to pages where all the edits were made by TuanminhBot (instead of just all the edits were made by the same user) is to stuff all that into a subquery, then select the pages where the single user was the bot, as in quarry:query/97857. Turns out all but one of those pages are; the only exception was vi:User talk:1.52.1.184.Since you're getting that many results for 1.52.*, you're probably going to have to run separate queries for each pair of digits (0-255).(0-255).* - if the query had returned 640000 results and instead of 64000, Quarry wouldn't have been able to deal with it. I was hoping you'd be able to get by on page_title LIKE '1.%', another query for page_title LIKE '2.%', and so on. —Cryptic 00:51, 8 October 2025 (UTC)
page.page_title LIKE '1.%'returns ~258,000 rows in ~211 seconds. I suppose that's good enough. Thanks! NguoiDungKhongDinhDanh 06:06, 8 October 2025 (UTC)
- That's all correct. The way you properly limit it to pages where all the edits were made by TuanminhBot (instead of just all the edits were made by the same user) is to stuff all that into a subquery, then select the pages where the single user was the bot, as in quarry:query/97857. Turns out all but one of those pages are; the only exception was vi:User talk:1.52.1.184.Since you're getting that many results for 1.52.*, you're probably going to have to run separate queries for each pair of digits (0-255).(0-255).* - if the query had returned 640000 results and instead of 64000, Quarry wouldn't have been able to deal with it. I was hoping you'd be able to get by on page_title LIKE '1.%', another query for page_title LIKE '2.%', and so on. —Cryptic 00:51, 8 October 2025 (UTC)
- @Cryptic: Thanks. I'm
- Also, this query won't do what the title says it does. The WHERE clause is dealt with well before HAVING, so informally, you're getting a bunch of pages, then all their revisions, then removing revisions not made by TuanminhBot, then checking to see if there was only one user among what's left; and that last part will always be true. —Cryptic 14:24, 7 October 2025 (UTC)
- where you can see that it had to fall back on a full table scan of page with an estimate of 15.6 million rows to look at. (All the entries with select_type "SUBQUERY" are for resolving the actor view; using actor_revision will cut them down to just a single entry, the one currently numbered 14.) Only thing I can think of to improve that is to add another condition on page_title so that it only looks at page titles starting with digits (since is_ipv4(page_title) isn't indexable) - adding
Inactive rollbackers
[edit]Could someone create/update a query to tell me how many of the 6,978 non-admins who have the MediaWiki rollback user right and haven't edited for 3+ years? (There won't be any admins.) This should be similar to Wikipedia talk:Autopatrolled#RfC: Inactivity requirements for autopatrolled. WhatamIdoing (talk) 21:40, 11 October 2025 (UTC)
- The same query as in Wikipedia:Database reports/Inactive users with advanced permissions can do this by editing the constants at the start. I've done a one-time run at quarry:query/97992. —Cryptic 01:11, 12 October 2025 (UTC)
- That says we have 2,889 inactive accounts with MediaWiki rollback that haven't edited for three years, which is 41%. Thanks. I'm going to make the same proposal at Wikipedia talk:Rollback as we did at WT:Autopatrolled earlier this year. WhatamIdoing (talk) 02:18, 12 October 2025 (UTC)
Under categorized pages
[edit]Hi. Is it possible to generate a list of articles by category count? This would need to exclude hidden cats. I'm specifically interested in biographical articles beyond those already identified as needing additional categories. So I guess something like ... sits within Category:Births by year or has Category:Living people and total number of categories is x (would suggest starting with two). Thanks!! Jevansen (talk) 23:27, 25 October 2025 (UTC)
- Yes, it's possible. I already have a per-project query like this here. It shows articles with the least amount of categories (4 or less; number can be changed), and excludes hidden cats and stub cats. If you need me to refine it, let me know. Stefen 𝕋ower's got the power!!1! Gab • Gruntwerk 23:52, 25 October 2025 (UTC)
- Excellent, I'll play around with that. Thanks Stefen. Jevansen (talk) 23:59, 25 October 2025 (UTC)