Difference between revisions of "User:BenSbot/Code2"

From JoCopedia
Jump to navigation Jump to search
m (woops, missed a colon)
(Updated code and added Explanation)
Line 70: Line 70:
 
     setlinks = ref.findall(Setlist)
 
     setlinks = ref.findall(Setlist)
 
     for x in setlinks:
 
     for x in setlinks:
         Entry = [x,Date,City,Venue,"No",title]
+
         Entry = [x,Date,City,Venue," ",title]
 
         linkslist.append(Entry)
 
         linkslist.append(Entry)
 
      
 
      
Line 102: Line 102:
 
     if count != 0:
 
     if count != 0:
 
         tablelist.sort(lambda x,y: cmp(x[1], y[1]))
 
         tablelist.sort(lambda x,y: cmp(x[1], y[1]))
         text = "\"\'\'\'" + str(a.title()) + "\'\'\'\" was played at the following concerts: \n\n" + "{| border=\"1\"\n| \'\'\'Date\'\'\'\n| \'\'\'Location\'\'\'\n| \'\'\'Venue\'\'\'\n| \'\'\'Encore?\'\'\'\n"
+
         text = "\"\'\'\'" + str(a.title()) + "\'\'\'\" was played at the following concerts: \n\n" + "{|class=\"wikitable sortable\" background = \"white\" border = \"1px solid rgb(153, 153, 153)\" cellpadding = \"2%\" rules = \"all\"\n!\'\'\'Date\'\'\'!!\'\'\'Location\'\'\'!!\'\'\'Venue\'\'\'!!\'\'\'Encore?\'\'\'\n"
 
         for b in tablelist:
 
         for b in tablelist:
 
             text = text + "|-\n"
 
             text = text + "|-\n"
             text = text + "| [[" + str(b[5]) + "|" + str(b[1]) + "]]"
+
             text = text + "| <span style=\"display:none\">&</span>[[" + str(b[5]) + "|" + str(b[1]) + "]]"
 
             text = text + "\n| " + str(b[2])
 
             text = text + "\n| " + str(b[2])
 
             text = text + "\n| " + str(b[3])
 
             text = text + "\n| " + str(b[3])
Line 119: Line 119:
 
print "fin"
 
print "fin"
 
</pre>
 
</pre>
 +
 +
=== Explanation ===
 +
 +
Here is a step by step explanation of what the above means:
 +
 +
# It imports python functionality (both python's own and [[wikipedia:m:Category:Pywikipedia|pywikipedia's]])
 +
# It defines variables:
 +
#* "text" as an empty string
 +
#* "site" as this wiki
 +
#* "linkslist" as an empty list
 +
#* "rea" as a regular expression that finds everything after an Encore(s)/First Encore(s) marker
 +
#* "reb" as a regular expression that finds everything after an Encore(s) marker
 +
#* "rec" as a regular expression that finds everything after a First Encore(s) marker
 +
#* "red" as a regular expression that finds everything after a Second Encore(s) marker
 +
#* "ree" as a regular expression that finds everything not in a numbered list
 +
#* "ref" as a regular expression that finds all internal links
 +
#* "reg" as a regular expression that finds everything on the line starting with bullet point "City:..."
 +
#* "reg2" as a regular expression that finds a bullet point followed by "City:"
 +
#* "reh" as a regular expression that finds everything on the line starting with bullet point "Venue:..."
 +
#* "reh2" as a regular expression that finds a bullet point followed by "Venue:"
 +
#* "rei" as a regular expression that finds a date
 +
# It put the pages in the category Shows into a list called "showslist"
 +
# It loops through all the pages in "showslist" for each on it:
 +
## gets the wiki code of the page and saves it as the string "page"
 +
## gets the title of the page and saves it as the string "title"
 +
## finds all elements in a numbered list in the setlist (not in encores) by using first "rea" then "ree" and deleting the positive results
 +
## finds "Encore", "FirstEncore", "SecondEncore", "City", "Venue" and "Date" by similar methods, using all the regular expressions defined above
 +
## puts all the internal links in "Setlist" in a list, goes through each one putting it into a list along with the Date, City, Venue, Which Encore, and title of page it was found in. Then adds this list to the list "linkslist" (a list of lists)
 +
## does the same for "Encore", "FirstEncore" and "SecondEncore"
 +
# It put the pages in the category Songs into a list called "songslist"
 +
# It loops through all the pages in "songslist" for each one it:
 +
## defines "tablelist" as an empty list
 +
## defines "count" as "0"
 +
## loops through all the lists in "linkslist" for each one it:
 +
### determines if the song page (from "songslist") is the same as the link in the list in "linkslist", if it is it appends the list to "tablelist" (a list of lists) and adds "1" to count, if not it does nothing
 +
## determines if count does not equal 0 if it is True ("count" does not equal zero) it:
 +
### sorts the list "tablelist" chronologically
 +
### adds the top section of the output to variable "text"
 +
### loops through all the lists in "tablelist" for each one it:
 +
#### fills in a row of the table with the relevant information found in the list
 +
### adds footer of the output to variable "text"
 +
### defines page for output to be placed on
 +
### checks output is different from what is currently on the page, if it is it prints output (string "text") to page
 +
# It prints fin to the python module so it is known that it is done

Revision as of 16:11, 11 August 2008

Here is the code for the second function I was programmed to do. If you have any questions ask in the discussion page.

import wikipedia
import catlib
import pagegenerators
import re


text = ""
site = wikipedia.getSite()
linkslist = []
rea = re.compile('^:\\s*\\W{3,3}(First\\s*)?Encores?\\W{3,3}.*' , re.I | re.S | re.M)
reb = re.compile('^:\\s*\\W{3,3}Encores?\\W{3,3}.*' , re.I | re.S | re.M)
rec = re.compile('^:\\s*\\W{3,3}First\\s*Encores?\\W{3,3}.*' , re.I | re.S | re.M)
red = re.compile('^:\\s*\\W{3,3}Second\\s*Encores?\\W{3,3}.*' , re.I | re.S | re.M)
ree = re.compile('^[^#\\n].*$' , re.M)
ref = re.compile('\\[\\[[^\\]]*\\]\\]')
reg = re.compile('^\\*\\s*City:.*$' , re.M)
reg2 = re.compile('\\*\\s*City:\\s*')
reh = re.compile('^\\*\\s*Venue:.*$' , re.M)
reh2 = re.compile('\\*\\s*Venue:\\s*')
rei = re.compile('(19|20)\\d\\d[-](0[1-9]|1[012])[-](0[1-9]|[12][0-9]|3[01])')

showscat = catlib.Category(site,'Category:Shows')
showslist = list(pagegenerators.CategorizedPageGenerator(showscat))
for show in showslist: 
    page = show.get()
    title = show.title()
    
    Setlist = ree.sub("",rea.sub("",page))
    
    Encore = ""
    a = reb.search(page)
    if a != None:
        Encore = ree.sub("",red.sub("",a.group()))
    
    FirstEncore = ""
    b =  rec.search(page)
    if b != None:
        FirstEncore = ree.sub("",red.sub("",b.group()))
    
    SecondEncore = ""
    c = red.search(page)
    if c != None:
        SecondEncore = ree.sub("",c.group())
    
    City = "Unknown"
    d = reg.search(page)
    if d != None:
        City = reg2.sub("",d.group())
    
    Venue = "Unknown"
    e = reh.search(page)
    if e != None:
        Venue = reh2.sub("",e.group())

    Date = ""
    g = rei.search(show.aslink())
    if g != None:
        Date = g.group()
        
    
    setlinks = ref.findall(Setlist)
    for x in setlinks:
        Entry = [x,Date,City,Venue," ",title]
        linkslist.append(Entry)
    
    encorelinks = ref.findall(Encore)
    for x in encorelinks:
        Entry = [x,Date,City,Venue,"Yes",title]
        linkslist.append(Entry)
    
    firstencorelinks = ref.findall(FirstEncore)
    for x in firstencorelinks:
        Entry = [x,Date,City,Venue,"First",title]
        linkslist.append(Entry)
    
    secondencorelinks = ref.findall(SecondEncore)
    for x in secondencorelinks:
        Entry = [x,Date,City,Venue,"Second",title]
        linkslist.append(Entry)
    
    
songscat = catlib.Category(site,'Category:Songs')
songslist = list(pagegenerators.CategorizedPageGenerator(songscat))
for a in songslist: 
    tablelist = []
    count = 0
    
    for b in linkslist:
        if a.aslink().lower() == b[0].lower():
            tablelist.append(b)
            count = count + 1
    
    if count != 0:
        tablelist.sort(lambda x,y: cmp(x[1], y[1]))
        text = "\"\'\'\'" + str(a.title()) + "\'\'\'\" was played at the following concerts: \n\n" + "{|class=\"wikitable sortable\" background = \"white\" border = \"1px solid rgb(153, 153, 153)\" cellpadding = \"2%\" rules = \"all\"\n!\'\'\'Date\'\'\'!!\'\'\'Location\'\'\'!!\'\'\'Venue\'\'\'!!\'\'\'Encore?\'\'\'\n"
        for b in tablelist:
            text = text + "|-\n"
            text = text + "| <span style=\"display:none\">&</span>[[" + str(b[5]) + "|" + str(b[1]) + "]]"
            text = text + "\n| " + str(b[2])
            text = text + "\n| " + str(b[3])
            text = text + "\n| " + str(b[4])
            text = text + "\n"
        
        text = text + "|}\n\n[[Category:Show Statistics]]"
        page = wikipedia.Page(site, (str(a.title()) + "/Concerts"))
        if text != page.get():
            page.put(text, u"Show Statistics")
    

print "fin"

Explanation

Here is a step by step explanation of what the above means:

  1. It imports python functionality (both python's own and pywikipedia's)
  2. It defines variables:
    • "text" as an empty string
    • "site" as this wiki
    • "linkslist" as an empty list
    • "rea" as a regular expression that finds everything after an Encore(s)/First Encore(s) marker
    • "reb" as a regular expression that finds everything after an Encore(s) marker
    • "rec" as a regular expression that finds everything after a First Encore(s) marker
    • "red" as a regular expression that finds everything after a Second Encore(s) marker
    • "ree" as a regular expression that finds everything not in a numbered list
    • "ref" as a regular expression that finds all internal links
    • "reg" as a regular expression that finds everything on the line starting with bullet point "City:..."
    • "reg2" as a regular expression that finds a bullet point followed by "City:"
    • "reh" as a regular expression that finds everything on the line starting with bullet point "Venue:..."
    • "reh2" as a regular expression that finds a bullet point followed by "Venue:"
    • "rei" as a regular expression that finds a date
  3. It put the pages in the category Shows into a list called "showslist"
  4. It loops through all the pages in "showslist" for each on it:
    1. gets the wiki code of the page and saves it as the string "page"
    2. gets the title of the page and saves it as the string "title"
    3. finds all elements in a numbered list in the setlist (not in encores) by using first "rea" then "ree" and deleting the positive results
    4. finds "Encore", "FirstEncore", "SecondEncore", "City", "Venue" and "Date" by similar methods, using all the regular expressions defined above
    5. puts all the internal links in "Setlist" in a list, goes through each one putting it into a list along with the Date, City, Venue, Which Encore, and title of page it was found in. Then adds this list to the list "linkslist" (a list of lists)
    6. does the same for "Encore", "FirstEncore" and "SecondEncore"
  5. It put the pages in the category Songs into a list called "songslist"
  6. It loops through all the pages in "songslist" for each one it:
    1. defines "tablelist" as an empty list
    2. defines "count" as "0"
    3. loops through all the lists in "linkslist" for each one it:
      1. determines if the song page (from "songslist") is the same as the link in the list in "linkslist", if it is it appends the list to "tablelist" (a list of lists) and adds "1" to count, if not it does nothing
    4. determines if count does not equal 0 if it is True ("count" does not equal zero) it:
      1. sorts the list "tablelist" chronologically
      2. adds the top section of the output to variable "text"
      3. loops through all the lists in "tablelist" for each one it:
        1. fills in a row of the table with the relevant information found in the list
      4. adds footer of the output to variable "text"
      5. defines page for output to be placed on
      6. checks output is different from what is currently on the page, if it is it prints output (string "text") to page
  7. It prints fin to the python module so it is known that it is done