Find taxonomy entries in text

Powerful transformation of text into structured data: what are the entities mentioned in text, what do we know about them.
Find taxonomy entries in text
GetTaxonomyHierarchy
input: string[] Documents, enum DocumentFormat, bool IncludeSingleSiblings, 
       string TaxonomyName
output: TaxonomySubTrees[] TaxonomyHierarchyResult

Takes as an input a list of Documents and their DocumentFormat : text or URL. A taxonomy hierarchy is then extracted from the document content and returned as an array of TaxonomySubTree for each document.

TaxonomySubTree stores the parent category and one or more child entities. These are both stored as MappedConcept instances, which contain the Title of the taxonomy node, a collection of ConceptOccurrence which stores where every occurrence was found, and Scoring information.

IncludeSingleSiblings specifies whether or not categories with only a single sibling are returned.
TaxonomyName specifies which taxonomy the input text should be matched against. Optionally, all taxonomies can be used at the same time.

Taxonomy AreaTaxonomy DescriptionTaxonomyName value
Alcohol & Drugs Alcohol and Drugs Thesaurus by the NIAAA aod
Government Integrated Public Sector Vocabulary (IPSV) by esd, UK ipsv
Pharmaceuticals Conditions, Symptoms & Drugs by Pingar pharma
Public Affairs Australian Public Affairs Information Service thesaurus by NLA apais
Any taxonomy area Use all taxonomies at once all

A taxonomy can be also supplied by the customer, in XML or RDF (SKOS) format.

See also Entity extraction.

Sample code in C#:

PingarAPIRequest request = new PingarAPIRequest();
request.AppID = "your app id";
request.AppKey = "your app key";
request.EntityExtraction = new EntityExtractionRequest();
request.EntityExtraction.Documents = new string[] { "document text" };
request.EntityExtraction.DocumentsFormat = DocumentFormat.Text;
request.EntityExtraction.IncludeSingleSiblings = false;
request.EntityExtraction.TaxonomyName = "all";
request.Language = Language.EN;

PingarAPIServiceSoapClient pingarAPI = new PingarAPIServiceSoapClient();
PingarAPIResponse response = pingarAPI.GetTaxonomyHierarchy(request);
int count = 0;
if (response.Error == null)
{
    foreach (TaxonomySubTrees document in 
             response.EntityExtraction.TaxonomyHierarchyResult)
    {
        Console.WriteLine("Taxonomy Hierarchy For Document " + count);
        foreach (TaxonomySubTree subtree in document.subtrees)
        {
            Console.WriteLine(subtree.Parent.Title);
            foreach (MappedConcept child in subtree.Children)
            {
                Console.WriteLine("  " + child.Title);
            }
        }
        count++;
    }
}

 
VIEW DEMO OF ENTITY EXTRACTION COMPONENTS
 

Explore Pingar


Share Points CIO Apache Solr BizSpark