Why you should use a taxonomy and how PINGAR will help you to get the most out it
Posted by admin on April 10, 2011 RSS Icon RSS
 

Have you ever considered using a taxonomy for organizing documents? If not, then you should read this blog post. You will learn about taxonomies and their advantages for finding relevant information. If you already using a taxonomy but struggle to utilize it, this blog post will demonstrate how you can save time and money by automating metadata assignment, and provide other examples of how taxonomies can help your business.

What is a taxonomy?
Taxonomy may sound like a scary scientific word, but it actually originates from a natural way of how people use language. We name things based on their properties and categorize them based on their similarity. Taxonomy is just a list of hierarchically organized names of categories, also called terms.

taxonomy.pngfeline.jpg

 

For example, scientific taxonomies classify biological organisms, specifying that cats, puma and lynx are felines, and felines are carnivores. Organizations and businesses define data taxonomies to provide guidelines on term usage across the enterprise and for organizing documents. For example, the IPSV taxonomy categorizes Taxis, Buses and Coaches under Vehicles, and Vehicles under Road transport. A taxonomy that contains definitions (scope notes) of categories is sometimes called a thesaurus.

Benefits of taxonomies for finding documents
taxis.pngTerms listed in a taxonomy can be assigned to documents as metadata describing their content, or in other words, documents can be stored under those terms. This makes taxonomies extremely useful for search, actually much more useful than standard keywords. And this is why:

1. Taxonomy bridges the gap between the author’s and the user’s terminology
Taxonomies define alternative terms describing the same category, which are quite handy. For example, IPSV lists Cabs, Minicabs, Taxicabs as alternatives for Taxis. So, if you are searching for “cabs”, this taxonomy will redirect you to documents stored under Taxis.

2. Hierarchy of terms lets you easily broaden and narrow down your search
Let’s assume that a document “Regulations for Taxi drivers” is stored under Taxis. If you are interested specifically in taxis, you will find that document under Taxis, nicely separated from documents on Buses and Coaches. However, if you have general interest in Vehicles, you will find under that term all documents on Taxis, Buses, Coaches and Vehicles in general, including “Regulations for Taxi drivers”.

However, you must ask yourself: Who will be storing the documents under taxonomy entries? Who will be assigning all that metadata? In a previous post we discussed that assigning metadata manually is not sustainable — our document libraries grow too fast! To address this common problem, PINGAR offers a service, which automatically finds taxonomy entries in text, and the good news is that this service just got better!

Automatic assignment of taxonomy terms to documents
The original taxonomy service was somewhat simplistic: Given a piece of text, it returned matching terms from all taxonomies that we supplied to the service. Only the names of taxonomy terms were returned without any ranking. The following changes to the service address these issues:

cancer.jpgA taxonomy is usually defined for a given vertical domain. For example, in the agricultural taxonomy Cancer is listed as a marine crab, whereas in the pharmaceutical taxonomy as a human disease. Now you have the option of specifying the domain corresponding to your text. Alternatively, you can use all taxonomies at once. This option also means that if you decide to host the Pingar API on your own server, you can supply your organization’s taxonomy instead, one or multiple.

Another important addition is the scoring of taxonomy terms based on their relevance for the given text. The output is ranked based on these scores, indicating which taxonomy terms primarily describe the document’s topics. For each taxonomy term we also supply its locations in text, which gives more options to developers. For example, they can highlight those terms in the original documents.

We kept the original useful feature of the taxonomy service: The taxonomy terms are grouped under their broader terms. This helps understanding results and categorizing texts by their broad categories.

Other applications of taxonomies
Taxonomies can be utilized for other useful tasks. Flat taxonomies with just one level can let you extract entities of a particular type. For example, PINGAR’s pharmaceutical taxonomy defines three types of entities: conditions, symptoms and drugs. Following this example, you can create an infinite number of taxonomies with entities useful for your organization: names of departments and their corresponding abbreviations, or products and their identifiers.

profanity.jpg

Another useful application was suggested by one of our clients. They needed to check any text published on their website through a content management system for inappropriate words. It was easy enough to compile a taxonomy of such words from online available lists. The result is a new tool available via the PINGAR API, called Profanity Checker. It identifies politically incorrect, derogatory, libellous, offensive words and phrases or innuendo. Use this tool, if you need to ensure that text is safe to display to the public before official release.

Get started using taxonomies
Taxonomies are hierarchies of terms that help in organizing documents and in searching for information. PINGAR provides useful tools for matching document texts to taxonomies. In this blog post, we described an improvement to PINGAR’s taxonomy matching service, which originally was released just one month ago. You can now choose from a list of taxonomies the one that best describes the area of your document.

If you want to share your taxonomy with others, send it to us and we will include it into our public API service. If you decide to host the PINGAR API internally, you have the option of using your own taxonomy privately. Note: Many taxonomies are publicly available for both commercial and non-commercial use. Look around, and you might not need to invest in defining your own.

Our research continues and more exciting updates to come!

Comments:

You must be logged in/registered to leave a reply. Login/Register »
 

Explore Pingar


Share Points CIO Apache Solr BizSpark