Entities Identified by Pingar

Entities Identified by Pingar

This is a full list of entities identified by the Pingar Entity Extraction module

Proper nouns

Proper nouns are unique entities such as New Zealand, Madonna, University of Auckland, or Odyssey. Unlike common nouns, which describe classes of objects (locations, people, organizations, books), proper nouns cannot be covered by a dictionary, because new entities of this kind are created every day.
Pingar identifies proper nouns based on the context of words and their intrinsic qualities, such as capitalization and typical name forms

Person - Full or short name of a person mentioned in text - Complete

Organization - Name of organization, company or institution recognized in text - Beta

Location - Name of a geographic location recognized in text - Beta

Coming soon are entities such as Events, Products and Brands, Companies, Educational Institutions, Venues, Movies, Songs, Performers, Art, Books, Writers, Documents, Awards and many more.

Pattern-based entities

These entities are identified using pattern matching (regular expressions), as well as analysis of their context to avoid ambiguities.

Time
Date
People’s age
Money amounts
Bank account number
Credit card number
IRD number
URLs
Email addresses
Phone numbers
GPS locations


Addresses

We identify postal addresses listed in text by determining their sub-parts such as:

Unit - A unit, apartment, suite, room, or flat, e.g. “Suite 2”
Level - A level or floor, e.g. “Level 1”
Street - Street number, street name and a street type, e.g. “32 Beach Road”
PO Box - Postal delivery address, e.g. “Private Bag 42”
Suburb - A part of a city, e.g. “Hillcrest”
Locality - A city, town or village, e.g. “Auckland”
Post Code - A postal index, e.g. “3216”
State - A part of a country, e.g. “Waikato” or “NSW”
Country - “Australia”, “NZ”


A valid address does not need to fill all of these fields, but only a selection.

Output format for Entity Extraction

Entities - A struct to contain lists of Entity objects
Entity - An object describing the entity found in text
string Entity.Title - A well-formed title of the entity (as it appears in text)
string Entity.Id - Unique id for the entity
string Entity.Type -Type of the entity (Email, URL, Person, Location etc.)
string Entity.Extra - Any additional information associated with the entity, e.g. kind of credit card
List Entity.Forms - Various forms of the same entity appearing in text
int Entity.Frequency - Number of occurrences of the entity in text
double Entity.Score - Score for weighting keywords and recording classifier’s confidence value
int Entity.TextStartLocation - First position in text
int Entity.TextEndLocation - Last position in text
List Entity.Subentities - List of subentities, which constitute parts of a given entity
SubEntity - An object describing a part of an entity, e.g. people’s first and last names, parts of addresses
string SubEntity.Title - A well-formed title of the subentity (as it appears in text)
string SubEntity.Type - Type of the subentity
int SubEntity.TextStartLocation - First position in text
int SubEntity.TextEndLocation - Last position in text


See also:

Taxonomy Matching - Pingar detects entities in text defined in a taxonomy, e.g. types of symptoms, condition and drugs, or terms that are specific to a given domain.
Keyword Extraction - Pingar extracts key topics described in a document.

Tags:
Display Messages: Threaded     Flat
0 Replies
 

Explore Pingar


Share Points CIO Apache Solr BizSpark