Jump to content

Wikipedia:Request a query

From Wikipedia, the free encyclopedia

This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.

You may also be interested in the following:

  • If you are interested in writing SQL queries or helping out here, visit our tips page.
  • If you need to obtain a list of pages that meet certain criteria, consider using PetScan (user manual) or the default search. PetScan can generate lists of articles in subcategories, articles which transclude some template, etc.
  • If you need to make changes to a number of articles based on a particular query, you can post to the bot requests page, depending on how many changes are needed.
  • For long-term review and checking, database reports are available.

The database replicas do not have access to page content, so requests which require checking wikitext cannot be answered with database queries. In particular, there is no way to query for template parameters or anything related to references. However, someone may be able to assist by querying in another way (for example, checking for external links rather than references) or suggest an alternative tool.

Non-IP user-talk pages with "Blanked IP talk"

[edit]

Today I came across a still-present mistaken bot edit from several years ago. It made that same edit twice to that same user-talk in a several-day timeframe, so I am wonderif if there are any other such edits that were made and have still not been corrected. User:BD2412bot has been inactive for a few years, but appears to have been making many thousands of edits in the timeframe of that bad edit so I can't easily scan its contributions-list by eye. Is it possible to generate a list of all user-talk pages of non-IP editors that have {{Blanked IP talk}}? DMacks (talk) 15:31, 2 November 2025 (UTC)[reply]

I think quarry:query/98734 should be what you're looking for? I outputted the pages as full URLs but can change it to bare titles if preferable. J11csd (talk) 16:26, 2 November 2025 (UTC)[reply]
Messed up the link but fixed now. J11csd (talk) 16:29, 2 November 2025 (UTC)[reply]
Thanks! No problem for me to convert among formats, once I had the actual information in some format. DMacks (talk) 22:07, 2 November 2025 (UTC)[reply]

Simple (I think) category query

[edit]

Could someone create a query that does the following:

  • Looks for articles + drafts in Category:CS1 maint: url-status
  • Checks if the page matches a regex that I give it: ( \{\{(?=\s*cite)(?=.*?url-status\s*=\s*live)(?!.*?archive-url).*?\}\} )
  • Sorts the pages by the number of matches it has descending, then limiting the number of rows to 900.
  • The output will be one column with the article name, another with the number of matches

This'll be be a page that I can run Citation bot on to fix the common CS1 error. EatingCarBatteries (contributions, talk) 06:05, 4 November 2025 (UTC)[reply]

The database replicas do not have access to page content, so queries which require checking wikitext cannot be answered with database queries. (Also, that regex doesn't do what you probably think it does. Try testing against "{{cite}}url-status=live" and "{{cite url-status=live}}archive-url".) —Cryptic 06:54, 4 November 2025 (UTC)[reply]
Good to know, thank you
Yeah I'm obviously by no means an expert. That regex is AI-generated as I'm still learning, but I have successfully ran it on dumps in AutoWikiBrowser. I knew there were some misses, but I figured that it got most of the stuff that it needed. EatingCarBatteries (contributions, talk) 07:13, 4 November 2025 (UTC)[reply]

How many page creations a day by IPs and non-autoconfirmed accounts in draftspace?

[edit]

In phab:T409366, we discovered that temporary accounts can't create drafts due to a bug. I'd be interested in figuring out how many drafts this group created on a random sample day from before temporary accounts, to figure out the impact of the bug. For example:

SELECT COUNT(*)
FROM revision
WHERE rev_parent_id = 0 /* first revision of a page */
AND rev_timestamp > 20251001000000
AND rev_timestamp < 20251001235900
AND (
  /* revision author is an IP */
  OR /* revision author is not autoconfirmed */
)

But that doesn't use the right indexes and the revision table has a billion rows, so needs optimization to work. –Novem Linguae (talk) 01:04, 6 November 2025 (UTC)[reply]

Unless you want to go back a very long time, the creation log is better than searching revision - it's smaller, has more appropriate indexes, it keeps the entry even if the page gets deleted, and it shows where the page was originally created rather than where it is now. The last is a major win for drafts that got accepted - you wouldn't be able to distinguish them just with revision - but it's going to have some false negatives for e.g. userspace drafts that got moved to draftspace.
The harder part will be verifying that a user wasn't autoconfirmed when they made the draft. It's not difficult to verify that the account was registered less than four days before the creation log entry, but the most straightforward way to check whether they had ten previous edits is going to be very, very slow when it runs across a user who has a couple million edits. —Cryptic 01:24, 6 November 2025 (UTC)[reply]
If you accept an alternate-world definition of "autoconfirmed" where you have to have 10 currently non-deleted edits, quarry:query/98850 has all results for your October 1 test date, and quarry:query/98851 has counts for each day in 2025. Those are upper bounds; it's likely very common for such users to have deleted edits, so these both will include creations by users that actually were autoconfirmed at the time. If you need accurate numbers, I'd suggest going through the October 1 hits manually - there shouldn't be any false negatives other than the moved pages I mentioned above. —Cryptic 02:41, 6 November 2025 (UTC)[reply]
...or at least, there wouldn't have been if I hadn't accidentally omitted creations by users with fewer than ten currently-live edits, whether from before or after the page creation in question. When corrected, that approximation had 177 hits on October 1, not the 74 I found before.
Current versions of both queries check for deleted edits properly too now, unless I did something else similarly dumb. (It's been a very long day.) I can't say I'm at all proud of how I went about it, either. And it doesn't even make a whole lot of difference; including the deleted edit check only removed five draft creations from October 1, and similar numbers from other days in 2025. —Cryptic 04:34, 6 November 2025 (UTC)[reply]
FYI, bug fixed. Thanks @Novem Linguae for it. :) – robertsky (talk) 08:23, 6 November 2025 (UTC)[reply]

List of transclusions

[edit]

I am looking for the list of pages that transclude {{Annual readership}}, or one of its redirects. The count tool reports a tally of 53,098, but the 'What links here?' tool (linked as 'sorted list' at Special:WhatLinksHere/Template:Annual_readership) never returns. (I don't think the problem is a volume issue, or flooding the browser; I tried the 'sorted list' link from a template that has 50 transclusions, and it never returned, either.) Can you provide a query that will produce the sorted list? There are a half a dozen redirects; if that complicates the solution, it's fine to just drop redirects. Adding interested party Andrew Davidson. Thanks! Mathglot (talk) 20:57, 8 November 2025 (UTC)[reply]

petscan:40332113. —Cryptic 21:11, 8 November 2025 (UTC)[reply]
Thank you! And I even knew about Petscan, but forgot I could use it for that. Thanks again, Mathglot (talk) 22:41, 8 November 2025 (UTC)[reply]
Yes, and thanks from me too -- it's a useful addition to my toolkit. Andrew🐉(talk) 23:20, 8 November 2025 (UTC)[reply]
Also can get it from the command line with wikiget: wikiget -b "Template:Annual readership" -tt > out.txt .. -- GreenC 17:11, 12 November 2025 (UTC)[reply]
[edit]

Could someone please run the following query for me? I am unable to do so myself. Thank you!

SELECT DISTINCT page_title FROM page JOIN externallinks ON el_from = page_id WHERE page_namespace = 0 AND (el_to_domain_index LIKE 'com.twitter.%' OR el_to_domain_index LIKE 'com.x.%'); ~2025-32654-48 (talk) 22:30, 10 November 2025 (UTC)[reply]

No results for that; el_to_domain_index always begins with a protocol. quarry:query/99001 has what you meant. (Spoiler: there's lots.) —Cryptic 22:39, 10 November 2025 (UTC)[reply]
[edit]

Could someone please modify the query at quarry:query/99001? I need to count citations to twitter.com or x.com, but only based on page versions from June 2023. This would likely require joining with the `revision` table and filtering on `rev_timestamp` to be between '20230601000000' and '20230630235959'. I understand that extracting dates from the citation text itself isn't possible through Quarry. Thank you! ~2025-32977-37 (talk) 11:27, 12 November 2025 (UTC)[reply]

SQL query request for new citations to twitter.com/x.com in August 2023

[edit]

Could someone please run the following query for me? I am unable to do so myself. The query is intended to count the number of new citations to 'twitter.com' or 'x.com' for each day of August 2023. Thank you!

-- This query counts the number of new citations to 'twitter.com' or 'x.com'
-- for each day of August 2023.
-- NOTE: It cannot extract the publication date from the citation text. Instead,
-- it uses the timestamp of the earliest revision of the page containing the
-- link as a proxy for the addition date.

SELECT
    first_appearance_day,
    COUNT(*) AS new_citations_count
FROM (
    -- Subquery to find the first revision timestamp for each unique external link
    SELECT
        el.el_id,
        DATE_FORMAT(MIN(rev.rev_timestamp), '%Y-%m-%d') AS first_appearance_day
    FROM
        externallinks AS el
    JOIN
        revision AS rev ON el.el_from = rev.rev_page
    WHERE
        -- Filter for twitter.com and x.com links using the indexed domain column
        (el.el_to_domain_index LIKE 'http://com.twitter.%' OR el.el_to_domain_index LIKE 'https://com.twitter.%' OR el.el_to_domain_index LIKE 'http://com.x.%' OR el.el_to_domain_index LIKE 'https://com.x.%')
    GROUP BY
        el.el_id
) AS link_creations
WHERE
    -- Filter the results to include only links first appearing in August 2023
    first_appearance_day BETWEEN '2023-08-01' AND '2023-08-31'
GROUP BY
    first_appearance_day
ORDER BY
    first_appearance_day ASC;

~2025-33047-06 (talk) 16:04, 12 November 2025 (UTC)[reply]

Not sure what you are asking. Are you unable to use Quarry and run it yourself? -- GreenC 17:07, 12 November 2025 (UTC)[reply]
This isn't going to do what you want. If it completed (it wouldn't), it would show, of the pages that currently have an external link to twitter or x, the ones that were created in August 2023. There's no guarantee that the links were present since creation, and no reason to think that no links were newly-added to existing pages in that month.
To find when the links were actually added, you'd have to use the text table, and compare the wikitext between past revisions. That's not practical, even if you only look at pages that currently have a matching external link; and the public replicas don't have text anyway. Best bet is to download a consecutive pair of database dumps from around then, extract a list of pages containing external links from each, and compare them. —Cryptic 18:25, 12 November 2025 (UTC)[reply]
Nonetheless, quarry:query/99051 has what your query above would've resulted in, if it worked (unlikely) and if it didn't double-count pages for each link they have matching one of those formats. —Cryptic 19:27, 12 November 2025 (UTC)[reply]