1. What is a Taxonomy?
Accern’s Taxonomy, the Knowledge Graph (KG) has been an integral part in accelerating the Text Extraction Analytics to cover maximum articles/ news. The KG covers different asset classes like Equities, Commodities, Forex, Cryptocurrencies, Financial Events/Themes, etc. Accern’s KG covers more than 50K Entities which majorly consists of Equities listed in the U.S. exchanges (NYSE/Nasdaq/NYSEARCA), OTC market, and International Exchanges. Apart from Public companies, the KG is also enhanced with Private companies, Personalities, Locations, and more importantly Financial Events or Themes.
2. What is an Entity?
An entity is classified as a real-world object such as “Organization”/ “Company”, that can be denoted with a proper name. Entities are the words or letters mentioned in the unstructured data that are extracted as key information by the Named-entity recognition (NER) system.
Equity: In the purview of Accern’s KG, an Equity Entity is either a publicly-traded/listed or privately held Company/Organization whose ownership is organized via shares of stock that are intended to be freely traded on a stock exchange, over-the-counter markets, or privately in the case of Private Entities. Accern has a growing collection of US and International Equity Entities in the Knowledge Graph.
Cryptocurrency: The Entities under this asset class are assigned under digital platforms where they are traded as there are no specific recognized exchanges. Popular abbreviations are used as tickers for Cryptocurrency Entities.
Forex: Foreign Exchange (FOREX or FX) is the over-the-counter market in which the foreign currencies of the world are traded. Standard global abbreviations for currencies are used as tickers.
Commodity: A reasonably interchangeable economic good or material that is traded as an article of commerce in the commodities exchange. Some of the commodities are recognized under a specific exchange and the respective codes and tickers are added.
3. What is an Event (Theme)?
An Event/Theme is defined as an occurrence in a business scenario. It can be a common high-level financial occurrence or it can be a specialized scenario around the business. It is textual information that helps to identify the business scenario/occurrence around a particular entity. Accern has a collection of more than 200 Events in the Knowledge Graph. These Events are curated after research on how to structure different macro and micro-level occurrences that a client cares about.
4. How to create Entities or Events?
Entities or Events are created to extract particular information or data. In the process of creating these taxonomies, we gather metadata around the particular Entity or Event, such as Variations, Subsidiaries, Leadership, Product Variation, Exclusions, etc. to enter into the respective fields required by the platform for better extraction.
5. What are Lucene queries?
Lucene is a query language that helps in the full-text search of any document, it helps to identify/filter specific structures of the document. Lucene helps to improve search quality and query performance.
We generally create a Lucene query by using various syntax to perform efficient searches.
“OR” & “AND”: This syntax is for specific field search, to create the Lucene query we find the particular text variation of an event/entity and convert it into a structured query.
“?” symbol: We create a query with the “?” symbol for wildcard searches that performs to find any single character search.
“*” symbol: We create a query with the “*” symbol for multiple character search.
“Proximity” search: This query supports finding words that are within a specific distance away.
“NOT”: The NOT operator excludes documents that contain the terms after NOT.
For more information on Lucene queries, you can refer to this source:
6. How to create and use Lucene Queries for Events or Entities?
Step 1. Refer to the metadata (Variations, Subsidiaries, Exclusions, etc.) and select appropriate keywords to form the phrases of the query.
Step 2. Research and find particular phrases in news or blogs and make a query using different search operators (AND/OR/*/?/~/NOT).
Step 3. Create the structure of the query and review, to make the queries effective. Add the queries to the respective taxonomy in the platform under the Manual Search Queries field in the Advanced options.
Let’s go through the step-by-step process of creating a taxonomy.
Step 1. Click on the “AutoML Taxonomy” section and the “New” button to create your Entity or Events taxonomies.
Step 2. The “New Taxonomy” window will pop up. Select the “Taxonomy Type” that you'd like to build from the dropdown list and fill out the required fields accordingly. You can also explore the advanced options for more defined and granular ways to extract taxonomies.
The fields may vary for each type of Taxonomy and will be commonly segregated as below:
Taxonomy Name: Name of the taxonomy. (Eg: Apple Inc.,)
Variations: Different textual versions of any particular taxonomy.
For example Apple, Apple Incorporated, iPhone, Macbook, etc. You can also enter multiple variations by using pipes “|” in-between each variation. (You can also use the keywords from the Suggested Variations as part of Variations)
Additional Details (In the case of Entities)
Asset Classification: Public/Private
Ticker: Enter the ticker of the Publicly listed column or create a random ticker for any Private Entities.
Exchange: Select from the list of exchanges in the dropdown. For anything Private, select Private Entities from the dropdown.
Share Series: Enter the share class of the listed Equity. For example: “Apple Inc. Class A” or “Berkshire Hathaway Inc. Class A” etc.
Leadership: Add the company’s board members/Executives or the influential personalities associated with the company. You can also enter multiple variations by using pipes “|” in-between each leadership.
Competitors: You can enter multiple competitors by using pipes “|” in-between each competitor.
Subsidiaries: You can enter multiple subsidiaries by using pipes “|” in-between each subsidiary.
Figi: Accern provides for Openfigi Financial Instrument Global Identifier® to be added as additional information to the taxonomies for better mapping and identification.
Figi Composite: You can enter Composite ID for the respective Entities from Openfigi.
Unique Id: You can enter a Unique ID for the respective Entities from Openfigi.
Share Class: This is the Share Class ID provided by Openfigi for Entities and shouldn’t be confused with the Share Series mentioned above.
Security Type: You can enter the Security Type mentioned for the respective Entities from Openfigi.
Indices: Select an Index to which the taxonomy belongs from the dropdown or simply create a new one and enter.
Sector: Select a Sector to which the taxonomy belongs from the dropdown or simply create a new one and enter.
Industry: Select an Industry to which the taxonomy belongs from the dropdown or simply create a new one and enter.
Manual Search Queries: Click on “Open Editor” and it will open a new dialog box. Select the value type “OR” or ”AND”. Add and enter the Lucene queries in the Query Text Box and click on “Save”. More information on how to create Lucene Queries is mentioned in the document below.
Expansion Rules: Click on “Open Editor” and it will open a new dialog box. Select the rules from the drop-down and check the boxes to which the rules should apply. Click on “Add Rule” and then “Save”.
Query Source: The source will be “Both” as default. In case of applying only Lucene queries, the source can be set to “Manual”
Exclusions: Add text/keywords that you do not want to be associated with the taxonomy which helps to identify it effectively by eliminating duplicate/false information. You can enter multiple Exclusions by using pipes “|” in-between each exclusion.
Product Variations: Add different variations of products or brands owned by any particular Entity. You can enter multiple Products by using pipes “|” in-between each Product.
Step 3. After finishing all the configurations, click “Add Taxonomy” to proceed.
Step 4. The newly created taxonomy will appear on the list with DRAFT status. Before using the taxonomy to build any use case, please wait 4-5 minutes and change it to LIVE status by clicking the Sync icon button (next to New and Import) and refreshing the page or going back to the homepage and click on AutoML Taxonomy section again.
Step 5. As you review the results and iterate, you can always edit or delete the taxonomy. Click on the Pen icon to edit and the Trash icon on the right corner to delete. If you edit any custom taxonomy, it will change from LIVE to the DIRTY status. Please wait 4-5 minutes post-editing, then click the Sync button and refresh the page to ensure they are in LIVE status again before you use them for any use cases.