Home » Products
OrcaTec Information Discovery Toolkit
The OrcaTec Information Discovery Toolkit is an easily deployed software appliance that provides an integrated collection of information analysis and management services. Current modules include:
- Near Duplicate Clustering
- Concept Search (includes Boolean)
- Language Identification
- Interesting Phrase Finder
- Email Threading
These modules are distributed as an rPath-based software appliance, making them very easy to install and to maintain. A software appliance is an application that is distributed with its own operating system and all systems needed to run and maintain it. It runs on commodity servers (x86 based systems with up to 64 GB of memory). The appliance includes its database and data management tools. It exposes its functionality through a RESTful XML or JSON API. Like hardware appliances, it requires virtually no care or maintenance from database administrators or operations personnel. The appliance comes with an automatic installation script, with which it installs itself on a bare commodity server. Installation and all maintenance is done through an easy-to-use web interface. Administrators should never have to access the component parts. Software appliances are known for their easy upgradability, easy integration and easy maintenance.
Most data centers already include a number of hardware and software appliances, such as routers, firewalls, and VPNs.
All of the OrcaTec Information Discovery Toolkit components are based on language modeling. A language model captures the patterns in language use and makes use of those patterns to detect, classify, and cluster information. The OrcaTec modeling approach is derived in part from years of basic research in information retrieval and cognitive science, in particular on investigations of dolphin biosonar.
The modules are priced on a per-server basis. We impose no limits on the amount of data that may be processed on a single system. Version 2 supports a data ingest rate of up to 2 million documents per day.