Difference between revisions of "TheMorganReport:Community Portal"

From TheMorganReport
Jump to navigation Jump to search
 
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
=Equipment=
+
=[[Instructions for Editors]]=
 +
 
 +
=How We Did This=
 +
==Equipment==
 
*Windows XP box with 2GB RAM, AMD Athlon XP 2700+
 
*Windows XP box with 2GB RAM, AMD Athlon XP 2700+
*HP Scanjet 3670
 
 
*Plustek OpticBook 3600
 
*Plustek OpticBook 3600
  
=Software=
+
==Software==
*HP Scanjet software (bundled)
 
 
*Adobe Acrobat Professional 7.0
 
*Adobe Acrobat Professional 7.0
 
*Adobe Photoshop CS 8.0
 
*Adobe Photoshop CS 8.0
Line 11: Line 12:
 
*OpticBook 3600 driver
 
*OpticBook 3600 driver
  
=Procedure=
+
==Procedure==
==hp scanjet (obsolete)==
+
===scan pages from opticbook 3600===
*All Programs...Hewlett-Packard...Scanjet Scanner 36X0 Series...Photo & Imaging Director
+
300dpi, grayscale
**Choose "Scan Document"
+
===batch rename pages===
***Select - Scan for editable text (OCR)? Yes
+
from "Image xxxx.jpg" to "xxxx-xxxx.jpg" to reflect page numbers
***Select - Original contains graphics? No
+
===batch resize pages===
***Scan to: "Save to file"
+
using photoshop automation, save for web, jpeg low settings, and 38% scaled
***Click "Scan"
+
===wiki upload===
**Rotate as appropriate with buttons on left hand side
+
batch upload jpg files to wiki
**Click "Accept"
+
 
**Click "No" when asked to scan another image
+
*created wiki.cfg file with two lines:
**Save file as <page numbers>.pdf (e.g., 366-367.pdf)
+
**user=
*Open pdf file
+
**password=
**Tools...Advanced Editing...TouchUp Object Tool
+
*Had [[Upload perl script]] in the same directory as wiki.cfg
**Left-click on page to select, then right-click and choose "Edit Image"
+
**Modified a few lines to point to http://morganreport.org/mediawiki/index.php:
**Ignore warning about flattening image (Check "Don't show again" and click "OK")
+
server=> WebServer->new("http://morganreport.org"),
**File...Save for web...
+
url=> "/mediawiki/index.php",
**Choose "JPEG Low" preset
+
 
**Adjust image size to 75%, click "Apply"
+
*Used OpenOffice Calc to create a list of commands
**Click "Save"
+
perl wiki-upload.pl "502-503.jpg" "Reports of Committee on Foreign Relations 1789-1902 Volume 6 pp502-503"
**Save file as <page numbers>.jpg (e.g., 366-367.jpg)
+
 
==wiki upload==
+
*Copied commands into dos batch file, and executed batch
*Upload jpg file to wiki
+
 
**Description: Reports of Committe on Foreign Relations 1789-1901 Volume 6 pp<xxx>-<xxx>
+
===wiki stubs===
 
*Navigate to page in wiki, and put in stub code
 
*Navigate to page in wiki, and put in stub code
 
**for first page, make previous=Main Page, for last page, make next=Main Page
 
**for first page, make previous=Main Page, for last page, make next=Main Page
 
<pre>{{Double Page|previous=<xxx>-<xxx>|current=<xxx>-<xxx>|next=<xxx>-<xxx>}}</pre>
 
<pre>{{Double Page|previous=<xxx>-<xxx>|current=<xxx>-<xxx>|next=<xxx>-<xxx>}}</pre>
 +
===batch OCR full resolution pages===
 +
create PDF with FineReader
 +
===wiki upload text===
 
*Click on the "Template:<xxx>-<xxx>" link and copy the text from the pdf
 
*Click on the "Template:<xxx>-<xxx>" link and copy the text from the pdf
 
*Spell-check and copy edit the text
 
*Spell-check and copy edit the text
**see [[Instructions for Editors]]
 
 
=Notes=
 
*Scanning in at 300ppi, 256 gray shades (8-bit grayscale)
 
*I'm not uploading the raw PDF files, since they're about 5 times as large as the jpgs
 
*Scanning pages 362-1169 (807 pages)
 
*2 pages takes roughly 5 minutes to scan, convert, upload and add text
 
**2017.5 minutes total required
 
**33.625 hours required
 
**approximately 1 hour/day available
 
**about 40 days total
 
*14 pages already scanned (but not proofed)
 
 
=[[Instructions for Editors]]=
 

Latest revision as of 16:33, 7 February 2006

Instructions for Editors

How We Did This

Equipment

  • Windows XP box with 2GB RAM, AMD Athlon XP 2700+
  • Plustek OpticBook 3600

Software

  • Adobe Acrobat Professional 7.0
  • Adobe Photoshop CS 8.0
  • Abbyy FineReader 8.0 Professional
  • OpticBook 3600 driver

Procedure

scan pages from opticbook 3600

300dpi, grayscale

batch rename pages

from "Image xxxx.jpg" to "xxxx-xxxx.jpg" to reflect page numbers

batch resize pages

using photoshop automation, save for web, jpeg low settings, and 38% scaled

wiki upload

batch upload jpg files to wiki

server=> WebServer->new("http://morganreport.org"),
url=> "/mediawiki/index.php",
  • Used OpenOffice Calc to create a list of commands
perl wiki-upload.pl "502-503.jpg" "Reports of Committee on Foreign Relations 1789-1902 Volume 6 pp502-503"
  • Copied commands into dos batch file, and executed batch

wiki stubs

  • Navigate to page in wiki, and put in stub code
    • for first page, make previous=Main Page, for last page, make next=Main Page
{{Double Page|previous=<xxx>-<xxx>|current=<xxx>-<xxx>|next=<xxx>-<xxx>}}

batch OCR full resolution pages

create PDF with FineReader

wiki upload text

  • Click on the "Template:<xxx>-<xxx>" link and copy the text from the pdf
  • Spell-check and copy edit the text