• Members 58 posts
    April 10, 2023, 7:34 a.m.

    As a starting point which might be useful I enclose the following Python which I've used to convert my JSON download from DPR into separate HTML files. I can then open the HTML in a web browser and all the markup is understood. This is only for the forums, but you could easily change it for the other files. All the files are generated in the current directory - so you might want to run this in a empty subdirectory. You also need to change the line <<PATH TO JSON FILE HERE>> to point to the location of your forums file when you unzip what DPR has sent. Remember on Windows that the file name will need to have any backslashes ('\') escaped with another '\', so \myfiles\forums.json becomes \myfiles\forum.json

    (Note this is meant as a starting point, use at your own risk )

    import sys
    import os
    import json
    import re
    import io
    
    # Open the JSON file, this would end if (for example) forum-posts.json
    with open('<<PATH TO JSON FILE HERE>>', 'r', encoding="utf8") as f:
      data = json.load(f)
    
    for post in data['forumPosts']:
        # Get the forum name, remove "talk" and replace spaces with underscores
        forum=re.sub("_Talk$", "", re.sub(" ", "_", post['forum']))
    
        # Get the post date and shorten it
        dt=post['postDate']
        dt1=f"{dt[0:4]}{dt[5:7]}{dt[8:10]}{dt[11:13]}{dt[14:16]}{dt[17:19]}"
    
        # Get the url (unique number)
        url=post['url'].split('/')
    
        # Get the subject - remove Re: and make it file system safe
        subj=re.sub("^Re:* *", "", post['subject'])
    
    
        # make safe filename, code by Mitch McMabers https://stackoverflow.com/questions/7406102/create-sane-safe-filename-from-any-unsafe-string
        clean = re.sub(r"[/\\?%*:|\"<>\x7F\x00-\x1F]", "-", re.sub(" ", "_", f"{subj}"))
    
        # Create a title for the file
        ntitle="{}_{}_{}_{}.html".format(forum, clean, dt1, url[len(url)-1])
        print (f"Creating {ntitle}")
    
        # Check the file doesn't exist, and then write all the message contents to the file
        try:    
            fp=open(ntitle, "r")
            print (f"File {ntitle} exits")
            sys.exit(1)
        except FileNotFoundError:     
            fp = io.open(ntitle, "w",encoding="utf8")
            fp.write(post['messageTextHtml'])
            fp.close()
    
  • April 10, 2023, 7:38 a.m.

    Useful program. Thanks. We hope to get a way of allowing DPReview posts to be injected into the database, but it will take time, and is not top priority, given that we have lots to do and DPReview will be maintaining the site as a husk.