FAQ

Frequently Asked Questions

What are the system requirements?
Does InfoCrawler have a Web or Windows interface?
Is it easy to set up and configure InfoCrawler? How long will it take?
What could be the reasons why a web site is not crawled?
One of the InfoCrawler services do not start, what to do?
What is the default password to administer InfoCrawler?

 

 

What are the system requirements?
InfoCrawler requires one of the following operating systems:

  * Linux
  * Unix
  * Microsoft Windows XP/NT4/2000/2003 Server

Your computer requires at least 136 MB of disk space for the application.
It also needs at least 256 MB of RAM and a Pentium III-compatible processor or higher.


Does InfoCrawler have a Web or Windows interface?
The Administration Tool and the Search Interface are Web-based and can therefore be accessed from various operating systems such as Windows, Mac, Unix, etc.


Is it easy to set up and configure InfoCrawler? How long will it take?
Yes. The installation program sets up elements that enable the server to receive requests submitted by users. Then, network administrators only need to specify the files, servers, folders and Web sites that must be indexed. Once the index is created, users can start to create queries.

Using a Web browser, the administrator can log on to InfoCrawler’s administration tool and modify various settings. InfoCrawler quickly, easily and efficiently enables complex implementation projects to be fully completed within just a few days in order to comply with short deployment schedules and tight budgets.

What could be the reasons why a web site is not crawled?

There could be many reasons:

  • URL extension: verify that the URL extension is not bizarre and that it is included in the indexed extensions (collection parameters/Indexing tab/File extensions tab), by default InfoCrawler is configured with all the known extensions (html, asp, php, jsp, etc.)
  • Redirections: Some web sites redirects their base URLs towards another server, for example the web site http://www.news.com is redirected towards http://news.com.com/ and if you have indicated in the crawling options that the crawler must stay on the same server, then InfoCrawler will not follow that redirection, so the solution would be to see the final server URL and to indicate that URL to InfoCrawler instead of the original URL. In our example you would indicate the URL http://news.com.com/ instead of http://www.news.com, you can use your favourite web browser to help you.
  • Meta Robots: If you have enabled the option “Test Meta Robots” and if the web site you are crawling uses the META tags "NOINDEX,NOFOLLOW" then InfoCrawler will not crawl that site. To check a site META tags, use a web browser and load the site home page, then view the source of the page (generally using the Edit menu), disabled the option “Test Meta Robots” to avoid this particular case.
  • Parse scripts: Some web sites redirects their home page using java scripts, if you disable “Parse scripts” parameter then InfoCrawler will not be able to follow that redirection. Enable this option to crawl more web sites.
  • Robots.txt: The web site administrator can disallow accessing the whole web site or a part of it by using that file, you can choose to follow these recommendations or not. Disable this option to crawl more web sites.
  • Use head: This option optimizes the communications but it is not supported by all the web servers so disable this option if you want to avoid communication problems with some web servers
  • Minimum/maximum file size: If you specify a minimum or maximum size for a file to be crawled and if it happens that the welcome page of the crawled site is outside these limits than InfoCrawler will not crawl that site.
  • Flash: InfoCrawler do not actually support links that are embedded in Macromedia flash, usually webmasters that uses Flash includes URLs in html (even if it is hidden) so that crawlers can access the site web pages.

 

One of the InfoCrawler services do not start, what to do?

Usually the reason that InfoCrawler do not start is because the listening port is in use, for this just change the listening port and restart the service.

If you still have the problem than start the service in a DOS box instead of launching it as an NT service, for this open a DOS box, go to the InfoCrawler bin directory, for example “D:\InfoCrawler\bin”  and launch the command file “FtCrawler.bat”, you will have more details concerning the errors that are happening.

For the admin part you can do the same thing by launching the command “startup.bat” that you can find in the bin directory, for example “D:\InfoProducts\bin”.

What is the default password to administer InfoCrawler?

By default the administrator password is “admin” it is recommended to change it.

 

 © all rights reserved