Distributed architecture: InfoCrawler was designed from the ground up for distributed architecture, it is a 100% java WEB service, and can be executed permanently on one or more machines. Communicating using XML, its components can be installed on different machines: the administration, the spider, and the indexing engine
Intuitive administration: Using its own WEB based administrating interface, you can administer and monitor the different collections in a very user-friendly manner. The simplicity and flexibility limits the total costs of ownership.
Optimized crawling: Thanks to its multi-threaded architecture, InfoCrawler can spider many collections in parallel, and can have many threads per collection.
Powerful indexing: Using a powerful engine to index the documents, InfoCrawler can index various file types : HTML files, Microsoft office documents, PDF, XML, and more than 240 other types of documents (InfoCrawler Pro only).
Open technology: InfoCrawler does not use any proprietary technology, URLs are maintained using mySql database, the WEB administration is done using Apache Tomcat and JSP, the communication between the administration and the spider is done using XML, and the spider itself is 100% java.
Flexible: Being compatible with standards like HTML, XML, JSP, Java, and JDBC, InfoCrawler can be integrated easily in large projects.
Agents: The integrated InfoAgents module allows the creation of push agents; they will automatically detect any new information concerning a given subject and signal it to you.
Ajax: Using the new Ajax technology, users will be able to see suggestions while they are typing, allowing them to see the expected results before launching the search and thus reducing the mistakes.