Missing Metadata in SharePoint with PDF’s
I recently got a call from my end-users stating that their metadata was mysteriously disappearing when they are working with their PDF’s that are stored in SharePoint.
Take the following scenario. A user creates a PDF by scanning a paper document using Adobe Acrobat Professional. They then upload that document into SharePoint and assign it the required metadata (shown below).
A few days later, the user decides they need to edit the PDF. They open the PDF using the drop down menu and selecting Edit Document.
Once Adobe Acrobat Professional launches, they perform an OCR on the document so they can edit it. Selecting OCR Text Recognition > Recognize Text Using OCR from the Document menu they let it grind away for a few minutes. With the OCR complete, the user completes the edits and simply clicks Save.
After closing Adobe Acrobat Professional, the SharePoint document library refreshes and all the metadata is gone (see below).
How unfortunate!
Cause
After researching the issue I determined that when you perform an OCR on a document that has more than 1 page, Adobe Acrobat will actually delete the original document (thus removing all metadata associated with it) and create a brand spanking new file in its place (the evidence of this can be found by looking in the SharePoint recycle bin).
Doing a similar test on a file stored on a local drive had the same results. If you fill out the Summary tab by right-click on a PDF document and go to Properties and perform the steps above you will also lose your metadata.
I called Adobe support on this issue and they first responded with “Please explain how SharePoint works, we are unfamiliar with it”. Following my explanation and being put on a hold several times, I finally got this response, “We believe the problem is caused by SharePoint.”
I attempted numerous times to convince the phone support operator that it was not a SharePoint specific problem but was unsuccessful.
Solution
I came up with a solution that meet the needs of my users and thought I would share it. It is not revolutionary but uses built in SharePoint functionality.
If your users have a need to manipulate a PDF document in anyway, have them follow these steps.
Check out the document to the local drafts folder
This process will actually put the PDF in the users My Documents\SharePoint Drafts folder on their computer. Any further edits to the PDF by this user will be made to their local copy.
Once they have completed the edits, simply check the file back in. This will move the file from their local computer back into the SharePoint library. However, all PDF edits and metadata remains intact. (Yippee!)
Conclusion
I have mainly seen this problem occur with the OCR process and sometimes with the amend process when creating a PDF from a scanner. To be safe, I have recommended to my users to use the solution above at all times when manipulating a PDF document. It is not perfect, but it works.
March 27, 2008 Posted by Paul Liebrand | SharePoint | adobe, metadata, pdf | 15 Comments
About
My name is Paul Liebrand and I currently reside in Southern California. I plan on writing about anything related Microsoft SharePoint.
Because of other interests I have , you may also see posts off topic here.
I also started a community wiki site that will be dedicated to Windows SharePoint Services and related technologies. I update it has frequently as I can, but I encourage others to check it out and contribute where possible. The site can be found at http://www.wsswiki.com.
You can also follow me on Twitter at http://twitter.com/PaulLiebrand.
Hopefully you find the information here useful and valuable.
Pages
- Administration adobe alerts authentication Development Enhancements Feature Requests hotfix kb membership metadata my sharepoint sites My Site News object model Office Outlook 2007 pdf Personal Profile Publishing Links security SharePoint Technical Tools Troubleshooting URL Shortener Zune
Meta
Unknown Feed
- An error has occurred; the feed is probably down. Try again later.
-
Archives
- October 2009 (1)
- September 2009 (2)
- August 2009 (5)
- July 2009 (2)
- June 2009 (2)
- May 2009 (2)
- April 2009 (1)
- March 2009 (1)
- February 2009 (1)
- January 2009 (3)
- December 2008 (1)
- November 2008 (3)
-
Categories
-
RSS
Entries RSS
Comments RSS