Author Topic: the-elite.net's New Blog - Help Saving the Elite's News Archive  (Read 571 times)

Imperfect Clark

  • Posts: 2854
  • Worst DC
    • Clark
    • GE
    • PD
    • twitch
    • 2015RankingsDev
    • https://twitter.com/imperfectclark
the-elite.net's New Blog - Help Saving the Elite's News Archive
« on: October 10, 2009, 07:02:00 pm »
I ended up clearing off my to-do list except for one item - importing the old news archive. I am reaching out to the community to help me finish this last item and help preserve the Elite's history.

WordPress actually has an import for GreyMatter blogs,but somehow our archive files were deleted a few months ago (no idea).  Fortunately I noticed in time to recover almost all of them from Google's cache.  The GM to WP importer works by looking for .cgi files that contain all the data of a GM post, so to make the import work we need create the .cgi files using the data from the .htm files recovered from Google's cache.  The .cgi files simply contain a comma separated list of any and all data pertaining to a given entry, so this led me to the following process:

- Go through the htm files and manually c/p the data into an Excel spreadsheet, with columns matching the data items in the .cgi files
- Save the spreadsheet as .csv (comma separated).  Each row of text data in this file will pertain to the entirety of the text content of each (lost) .cgi file.
- Create the .cgi files using this data and the naming convention GM uses.
- Upload these files to the old GM directory and import using WordPress

Please let me know if anyone wants to step up and help me out with this.  Volunteers will be lavished with appreciate sentiment and community acknowledgment.

Feel free to ask questions.
Derek Clark

"i'm 1.1 in my hart, u know that" - Matthijs ten Ham.

vitor

  • Posts: 3644
    • GE
    • PD
the-elite.net's New Blog - Help Saving the Elite's News Archive
« Reply #1 on: October 11, 2009, 11:23:00 am »
You still have the zipped file with all the archived pages, right?

Imperfect Clark

  • Posts: 2854
  • Worst DC
    • Clark
    • GE
    • PD
    • twitch
    • 2015RankingsDev
    • https://twitter.com/imperfectclark
the-elite.net's New Blog - Help Saving the Elite's News Archive
« Reply #2 on: October 12, 2009, 07:38:00 pm »
Not sure where it is, actually. If you emailed it as a attachment, we're fine.

Who wants money to do this? I will throw money at someone.
Derek Clark

"i'm 1.1 in my hart, u know that" - Matthijs ten Ham.

vitor

  • Posts: 3644
    • GE
    • PD
the-elite.net's New Blog - Help Saving the Elite's News Archive
« Reply #3 on: October 12, 2009, 07:56:00 pm »
Glad that it hasn't been deleted yet:

http://rapidshare.com/files/273632353/the-elite_archive.rar.html

I have those on my pc if anything happens.

Silent Thunder

  • Posts: 1208
  • Less dangerous than invisible lightning.
    • GE
    • PD
the-elite.net's New Blog - Help Saving the Elite's News Archive
« Reply #4 on: October 12, 2009, 11:39:00 pm »
I might take a shot at this.  I'm thinking I'll write a small program that'll take a saved archive page from the old site and extract the information needed to rebuild a greymatter cgi file.  I did a bit of hunting and found some information on how greymatter entry files are put together:

http://web.archive.org/web/20050218100632/www.greymatterforums.com/index.php?topic=5478.0

I have the html pages that greg k linked.  I'm busy tomorrow, but I'll try and start working on it in a couple of days.  Wish me luck.

vitor

  • Posts: 3644
    • GE
    • PD
the-elite.net's New Blog - Help Saving the Elite's News Archive
« Reply #5 on: October 13, 2009, 01:10:00 am »
Wow that's nice, man! Good luck

Imperfect Clark

  • Posts: 2854
  • Worst DC
    • Clark
    • GE
    • PD
    • twitch
    • 2015RankingsDev
    • https://twitter.com/imperfectclark
the-elite.net's New Blog - Help Saving the Elite's News Archive
« Reply #6 on: October 13, 2009, 01:16:00 am »
Cool. Looks like you found the right file - I borrowed one from the MK64 blog.

If you do this, be prepared to pass along your paypal ID! :]
Derek Clark

"i'm 1.1 in my hart, u know that" - Matthijs ten Ham.

Silent Thunder

  • Posts: 1208
  • Less dangerous than invisible lightning.
    • GE
    • PD
the-elite.net's New Blog - Help Saving the Elite's News Archive
« Reply #7 on: October 14, 2009, 01:45:00 pm »
I'm looking through some of the archived pages right now.  One problem that I see is that none of them have the author listed, so I'll probably just assign a generic author name to each entry file.  I'll also have to do the same for the hour, minute, second variables.

For comments, the author information is there and so is most of the date info.

I also found the code for the php script that converts greymatter entries into wordpress, which will help as a guide to make sure the cgi files I make be converted correctly.

Currently I'm looking into how I'm going to break down the html to extract all of the information needed to build an entry file.

EDIT:

Clark:
Actually, is it necessary to save the comments to each news post?  Parsing each of the comments out looks like it'll be the biggest hurdle in this project.  I can probably do it if you really want them saved , but things will be simplified a great deal if I can leave them out.  Beyond that, I'm ready to start coding.

vitor

  • Posts: 3644
    • GE
    • PD
the-elite.net's New Blog - Help Saving the Elite's News Archive
« Reply #8 on: October 14, 2009, 04:59:00 pm »
I personally think it's unnecessary. We have those saved already in a safe place, so we don't need to put them up on the online archived news so people can see them (which I highly doubt they notice it anyway).

Imperfect Clark

  • Posts: 2854
  • Worst DC
    • Clark
    • GE
    • PD
    • twitch
    • 2015RankingsDev
    • https://twitter.com/imperfectclark
the-elite.net's New Blog - Help Saving the Elite's News Archive
« Reply #9 on: October 14, 2009, 09:17:00 pm »
Definitely ignore the comments; I had no intention of keeping them.

And not having the author is no big deal. Assign them to "admin"
Derek Clark

"i'm 1.1 in my hart, u know that" - Matthijs ten Ham.

Silent Thunder

  • Posts: 1208
  • Less dangerous than invisible lightning.
    • GE
    • PD
the-elite.net's New Blog - Help Saving the Elite's News Archive
« Reply #10 on: October 19, 2009, 04:04:00 pm »
Recreating the cgi files is done.  Hopefully I formatted them correctly.  We won't know until Derek attempts to convert them over to wordpress.

I still have to upload the files into the gm folder on the-elite.net.  Give me about an hour or so to get that done.  After that, let me know if there's any problem converting over to wordpress.

EDIT:
The cgi files have been uploaded to the greymatter archive folder.  Hope the conversion to wordpress works!

Imperfect Clark

  • Posts: 2854
  • Worst DC
    • Clark
    • GE
    • PD
    • twitch
    • 2015RankingsDev
    • https://twitter.com/imperfectclark
the-elite.net's New Blog - Help Saving the Elite's News Archive
« Reply #11 on: October 26, 2009, 11:28:00 am »
Wow, sorry I did not see this sooner. I will try the import tonight. Thank you!!
Derek Clark

"i'm 1.1 in my hart, u know that" - Matthijs ten Ham.