Difference between revisions of "TheMorganReport:Community Portal"

From TheMorganReport
Jump to navigation Jump to search
 
(16 intermediate revisions by the same user not shown)
Line 1: Line 1:
=Equipment=
+
=[[Instructions for Editors]]=
 +
 
 +
=How We Did This=
 +
==Equipment==
 
*Windows XP box with 2GB RAM, AMD Athlon XP 2700+
 
*Windows XP box with 2GB RAM, AMD Athlon XP 2700+
*HP Scanjet 3670
+
*Plustek OpticBook 3600
=Software=
+
 
*HP Scanjet software (bundled)
+
==Software==
 
*Adobe Acrobat Professional 7.0
 
*Adobe Acrobat Professional 7.0
 
*Adobe Photoshop CS 8.0
 
*Adobe Photoshop CS 8.0
=Procedure=
+
*Abbyy FineReader 8.0 Professional
*All Programs...Hewlett-Packard...Scanjet Scanner 36X0 Series...Photo & Imaging Director
+
*OpticBook 3600 driver
**Choose "Scan Document"
+
 
***Select - Scan for editable text (OCR)? Yes
+
==Procedure==
***Select - Original contains graphics? No
+
===scan pages from opticbook 3600===
***Scan to: "Save to file"
+
300dpi, grayscale
***Click "Scan"
+
===batch rename pages===
**Rotate as appropriate with buttons on left hand side
+
from "Image xxxx.jpg" to "xxxx-xxxx.jpg" to reflect page numbers
**Click "Accept"
+
===batch resize pages===
**Click "No" when asked to scan another image
+
using photoshop automation, save for web, jpeg low settings, and 38% scaled
**Save file as <page numbers>.pdf (e.g., 366-367.pdf)
+
===wiki upload===
*Open pdf file
+
batch upload jpg files to wiki
**Tools...Advanced Editing...TouchUp Object Tool
+
 
**Left-click on page to select, then right-click and choose "Edit Image"
+
*created wiki.cfg file with two lines:
**Ignore warning about flattening image (Check "Don't show again" and click "OK")
+
**user=
**File...Save for web...
+
**password=
**Choose "JPEG Low" preset
+
*Had [[Upload perl script]] in the same directory as wiki.cfg
**Adjust image size to 75%, click "Apply"
+
**Modified a few lines to point to http://morganreport.org/mediawiki/index.php:
**Click "Save"
+
server=> WebServer->new("http://morganreport.org"),
**Save file as <page numbers>.jpg (e.g., 366-367.jpg)
+
url=> "/mediawiki/index.php",
*Upload file to wiki
+
 
**Description: Reports of Committe on Foreign Relations 1789-1901 Volume 6 pp<xxx>-<xxx>
+
*Used OpenOffice Calc to create a list of commands
*Add new page link on wiki MainPage
+
perl wiki-upload.pl "502-503.jpg" "Reports of Committee on Foreign Relations 1789-1902 Volume 6 pp502-503"
[[<xxx>-<xxx>]]
+
 
*Edit new page in wiki, and put in stub code
+
*Copied commands into dos batch file, and executed batch
 +
 
 +
===wiki stubs===
 +
*Navigate to page in wiki, and put in stub code
 
**for first page, make previous=Main Page, for last page, make next=Main Page
 
**for first page, make previous=Main Page, for last page, make next=Main Page
 
<pre>{{Double Page|previous=<xxx>-<xxx>|current=<xxx>-<xxx>|next=<xxx>-<xxx>}}</pre>
 
<pre>{{Double Page|previous=<xxx>-<xxx>|current=<xxx>-<xxx>|next=<xxx>-<xxx>}}</pre>
 +
===batch OCR full resolution pages===
 +
create PDF with FineReader
 +
===wiki upload text===
 
*Click on the "Template:<xxx>-<xxx>" link and copy the text from the pdf
 
*Click on the "Template:<xxx>-<xxx>" link and copy the text from the pdf
 
*Spell-check and copy edit the text
 
*Spell-check and copy edit the text
**fix hyphenated words
 
**put a blank line before each paragraph
 
*Add the page to the bottom of the [[Transcribed Morgan Report]]
 
{{<xxx>-<xxx>}}
 
 
=Notes=
 
*Scanning in at 300ppi, 256 gray shades (8-bit grayscale)
 
*I'm not uploading the raw PDF files, since they're about 5 times as large as the jpgs
 
=Instructions for Editors=
 
On any given page, there will be some common navigation:
 
*A "Previous Page" link
 
*A "Next Page" link
 
*A "Template:<xxx>-<xxx>" link
 
*An image thumbnail
 
 
To help edit, follow the following procedure:
 
#click on the image thumbnail of the page.  This will slightly enlarge the image.  Click on the image again, and it should be large enough for you to read.  Leave this open in a browser window.
 
#In another browser window, open up the same page, and click on the "Template:<xxx>-<xxx>" link.  This will bring you to a page where you can edit the text.  Click on the "edit" button at the top to begin editing.  Compare the text in the editor with the image from the other window, and make corrections if necessary.
 
 
Minor procedural notes:
 
*To indicate the beginning of a page, use an html comment like so:
 
<pre><!--pxxx--></pre>
 
*If a word is hyphenated across a page (inter-rupted, for example), indicate the page break thus:
 
<pre>inter<!--pxxx-->rupted</pre>
 
*Remove all hyphenated words
 
*Put one line feed between paragraphs
 

Latest revision as of 16:33, 7 February 2006

Instructions for Editors

How We Did This

Equipment

  • Windows XP box with 2GB RAM, AMD Athlon XP 2700+
  • Plustek OpticBook 3600

Software

  • Adobe Acrobat Professional 7.0
  • Adobe Photoshop CS 8.0
  • Abbyy FineReader 8.0 Professional
  • OpticBook 3600 driver

Procedure

scan pages from opticbook 3600

300dpi, grayscale

batch rename pages

from "Image xxxx.jpg" to "xxxx-xxxx.jpg" to reflect page numbers

batch resize pages

using photoshop automation, save for web, jpeg low settings, and 38% scaled

wiki upload

batch upload jpg files to wiki

server=> WebServer->new("http://morganreport.org"),
url=> "/mediawiki/index.php",
  • Used OpenOffice Calc to create a list of commands
perl wiki-upload.pl "502-503.jpg" "Reports of Committee on Foreign Relations 1789-1902 Volume 6 pp502-503"
  • Copied commands into dos batch file, and executed batch

wiki stubs

  • Navigate to page in wiki, and put in stub code
    • for first page, make previous=Main Page, for last page, make next=Main Page
{{Double Page|previous=<xxx>-<xxx>|current=<xxx>-<xxx>|next=<xxx>-<xxx>}}

batch OCR full resolution pages

create PDF with FineReader

wiki upload text

  • Click on the "Template:<xxx>-<xxx>" link and copy the text from the pdf
  • Spell-check and copy edit the text