Entities Identified by Pingar
This is a full list of entities identified by the Pingar Entity Extraction module
Proper nouns
Proper nouns are unique entities such as New Zealand, Madonna, University of Auckland, or Odyssey. Unlike common nouns, which describe classes of objects (locations, people, organizations, books), proper nouns cannot be covered by a dictionary, because new entities of this kind are created every day.
Pingar identifies proper nouns based on the context of words and their intrinsic qualities, such as capitalization and typical name forms
Person - Full or short name of a person mentioned in text - Complete
Organization - Name of organization, company or institution recognized in text - Beta
Location - Name of a geographic location recognized in text - Beta
Coming soon are entities such as Events, Products and Brands, Companies, Educational Institutions, Venues, Movies, Songs, Performers, Art, Books, Writers, Documents, Awards and many more.
Pattern-based entities
These entities are identified using pattern matching (regular expressions), as well as analysis of their context to avoid ambiguities.
Time
Date
People’s age
Money amounts
Bank account number
Credit card number
IRD number
URLs
Email addresses
Phone numbers
GPS locations
Addresses
We identify postal addresses listed in text by determining their sub-parts such as:
Unit - A unit, apartment, suite, room, or flat, e.g. “Suite 2”
Level - A level or floor, e.g. “Level 1”
Street - Street number, street name and a street type, e.g. “32 Beach Road”
PO Box - Postal delivery address, e.g. “Private Bag 42”
Suburb - A part of a city, e.g. “Hillcrest”
Locality - A city, town or village, e.g. “Auckland”
Post Code - A postal index, e.g. “3216”
State - A part of a country, e.g. “Waikato” or “NSW”
Country - “Australia”, “NZ”
A valid address does not need to fill all of these fields, but only a selection.
Output format for Entity Extraction
Entities - A struct to contain lists of Entity objects
Entity - An object describing the entity found in text
string Entity.Title - A well-formed title of the entity (as it appears in text)
string Entity.Id - Unique id for the entity
string Entity.Type -Type of the entity (Email, URL, Person, Location etc.)
string Entity.Extra - Any additional information associated with the entity, e.g. kind of credit card
List Entity.Forms - Various forms of the same entity appearing in text
int Entity.Frequency - Number of occurrences of the entity in text
double Entity.Score - Score for weighting keywords and recording classifier’s confidence value
int Entity.TextStartLocation - First position in text
int Entity.TextEndLocation - Last position in text
List Entity.Subentities - List of subentities, which constitute parts of a given entity
SubEntity - An object describing a part of an entity, e.g. people’s first and last names, parts of addresses
string SubEntity.Title - A well-formed title of the subentity (as it appears in text)
string SubEntity.Type - Type of the subentity
int SubEntity.TextStartLocation - First position in text
int SubEntity.TextEndLocation - Last position in text
See also:
Taxonomy Matching - Pingar detects entities in text defined in a taxonomy, e.g. types of symptoms, condition and drugs, or terms that are specific to a given domain.
Keyword Extraction - Pingar extracts key topics described in a document.
Tags: