How to automatically populate metadata fields in SharePoint using the PINGAR API
Posted by admin on March 7, 2011 RSS Icon RSS
 

This is the first in a series of technical 'Deep Dive' blog posts. The technical blog posts will give the developers practical advice on how to use the PINGAR API components in real world applications.

This blog is written from the 2011 Australian SharePoint Conference and shows how the PINGAR API can help SharePoint users utilize the metadata-driven navigation and refinement features that enable a more productive work flow.

Extracting metadata using PINGAR API
Metadata, by definition, is "data about data". Metadata of a document can be its author, format, creation date and location -- all relatively simple to determine. It can also be information about the document's content such as tags, keywords and taxonomy entries, or names of mentioned people, organizations and locations. This kind of metadata helps greatly when narrowing down the search results, however it requires much effort when entering it by hand. This is where PINGAR's natural language processing technology comes in...

Using the PINGAR API feature GetKeywords you can automatically identify keywords; using the feature GetTaxonomyHierarchy you can map a document to a pre-defined category hierarchy; and using the feature GetEntities you can identify occurrences of many entities: from people, organizations and place names to fully-parsed postal addresses, or pattern based entities such as dates, times, phones, or credit card numbers. Up to 10 documents can be sent to PINGAR API per request, and you can expect a response within seconds.

Adding documents to SharePoint
If you need to add documents from a given location on hard drive to SharePoint, you can use this ready-to-go PowerShell script UploadDocumentsWithMetadata.ps1 that we have prepared using a template script presented by the PowerShell SharePoint evangelists at the last year's conference.

In this script, we set up 4 additional fields in a SharePoint collection: Keywords, People, Organisations and Locations, set to multi-lined string values. We then re-arranged the script using the New-WebserviceProxy cmdlet, which connects to the PINGAR API webservice. This cmdlet is used to call the PINGAR API methods GetKeywords and GetEntities to fill in the metadata fields. Feel free to download the script and try it out in your SharePoint installation.

Our partner and the leading New Zealand online experience consultancy, Provoke have come up with an alternative solution. Within a day, they have implemented a custom action for the SharePoint 2010, which talks to the PINGAR API Entity Extraction component and adds the entities metadata to existing documents within a SharePoint document library. When the user clicks the PINGAR Entity Extraction button in the dropdown menu, the selectedPingarAPI_SharePointItem.jpg document is sent to the PINGAR API for analysis. PINGAR then replies with a set of entities found within the document. At the moment the documents are limited to plain text documents, but this could be extended to Word, PDF and other formats.

Brendon Ford from Provoke says:

We have chosen the PINGAR Entity Extraction feature because many businesses have hundreds of legacy documents with limited associated metadata. Automatic extraction and population of these data will improve the users search efficiency. Our added feature enables users to filter search results by entities contained within the document such as people, addresses or organizations.

 

Here is an example of a document processed by the PINGAR API service. The example shows person, address, date, and time entities.

PingarAPI_EnteredMetadata.jpg

This concept could be extended to bulk process whole document libraries at a time. Terms could also be added to SharePoint Managed Metadata term stores, enabling search by tag and filter by tag.

Sign up for the free Sandbox account with PINGAR API and start improving the SharePoint experience today! 

Comments:

You must be logged in/registered to leave a reply. Login/Register »
 

Explore Pingar


Share Points CIO Apache Solr BizSpark