We not too long ago had a client who is a multi-national retailer with equally a actual physical and Internet existence. The client essential a way to receive certain business intelligence (BI) facts from the Internet on a everyday basis. Just after a number of unsuccessful makes an attempt to build this operation by themselves, they came to us for a option.
On the area the prerequisites appeared to be difficult and it was quick to see why their personal IT crew experienced unsuccessful to discover a option. They have been contemplating “inside the box”, even so, and hadn’t deemed third-get together choices. The technical specs demanded that the software execute all of these tasks:
Retrieve new merchandise listings on competitor’s world-wide-web websites.
Retrieve current pricing for all merchandise mentioned on competitor’s world-wide-web web sites.
Retrieve complete textual content of competitor’s Push Releases and general public economical experiences.
Keep track of all inbound backlinks pointing to competitor’s website web-sites from other net websites.
When the info was obtained it essential to be processed for reporting needs and then saved in the data warehouse for long term accessibility.
Just after reviewing recent web-primarily based info acquisition technology, such as “spiders” which crawled the Internet and returned data which then experienced to be processed as a result of HTML filters, we established that the Google API and Net Providers available the best remedy.
The Google API supplies distant accessibility to all of the research engine’s exposed operation and presents a communication layer which is accessed by means of the “Straightforward Item Access Protocol” (Soap), a world wide web companies regular. Given that Cleaning soap is an XML-primarily based technology it is conveniently built-in into legacy web-enabled purposes.
The API achieved all of the requirements of the application in that it:
Furnished a methodology for querying the Web making use of non-HTML interfaces
Enabled us to routine typical search requests created to harvest new and current data on the target subjects.
google reverse index delivered info in a format which was capable to be easily integrated with the client’s legacy devices.
Utilizing the Google API, Soap and WSDL, our builders were capable to define messages that fetched cached webpages, searched the Google doc index and retrieve the responses with out owning to filter out HTML or reformat the information. The resulting knowledge was then handed off to the client’s legacy techniques for validation, reporting and even further processing before reaching the details warehouse.
Throughout the Evidence of Thought period we ran assessments in which we ended up in a position to reliably determine and retrieve up to date community relations and investor relations facts that exceeded the client’s anticipations.
In our up coming test we retrieved the most at present offered product or service internet pages which had been stated in Google and then ran one more question to retrieve the Google “cached website page” variations. We ran these two information sets via variance filters and were being in a position to make precise price maximize and decrease studies as effectively as identify new products and solutions.
For our ultimate check we utilised the Google API’s capacity to entry the “link:” characteristic to swiftly create lists of inbound back links.
These confined checks demonstrated that the Google API was capable of generating the BI information that the customer requested as nicely as demonstrating that the data could be returned in a pre-defined structure which removed the have to have to use submit retrieval filters.
The shopper was delighted with the success of our Proof of Principle stage and licensed us to move forward with building the option. The software is now in day by day use and is exceeding the client’s functionality expectations by a wide margin.