Reverse engineering is a technique for analyzing the external behavior of a complex system to determine the processes involved in it's inner workings.
Unlike many reverse engineering projects, we cannot "take apart" our object of study, the search engines. We will have to limit our research to examining their behavior from afar.
We will limit our research to the top search engines, which are currently:
We will build the test bed online in a publically available web site where anyone can review and validate our results.
We will leave the tests up indefinitely, to monitor changing search engine behavior.
We will utilize unique strings to determine what text is (and is not) indexed by each search engine.
We will develop and validate additional testing methodoligies during the course of the project.
It is very difficult achieve certainty in many desirable answers regarding search engine behavior due to factors such as:
1. Search engine algorithms change constantly.
The technique that worked on Google yesterday may not work today.
2. Off-page factors are uncontrollable.
If you make a change to a web page and wait to see the changes, you test will be contaminated if a single other web page on the Internet links to your web page, or if an existing link to your web page changes in Page Rank. Page Rank is a sliding scale which is constantly in flux.
3. Timing is unknown
If you make a change to one of your web pages right now, it will take some time for each search engine to crawl the page. The search engine will then take an unknown amount of time to include the page in it's database. Different page factors may be included in the database at different times. The page may be placed in a temporary holding area in the database, where it shows up in the SERPs, before eventually being placed in the main database.
The tests we are currently conducting are: