E-Business Intelligence

A case study on the emerging technology of E-Business Intelligence.

Fall 1999

Nicole Wallace, Brian White, Temba Msezane.

Goizueta Business School Logo

    
 

 

Identification of the Technology

As described by its founders, NetSapien is a breakthrough technology which allows companies to review Internet sites of potential competitors and analyze the content of these site and its postings for the intent

of the content.  This is to say NetSapien , given client criteria for what constitutes logo infringement, possible attempts to fraudulently appear to be another company or to confuse their consumers, or otherwise divert revenue belonging to the client, can then go out on the web and interpret what messages or meanings other sites intend to send to your potential customers.  Once NetSapien has gathered this information, human intelligence (business analysts) is applied to the equation also to ensure a thorough business analysis.  NetSapien is enabled by the following technologies:

·         a custom Webcrawler,

·         Bots,

·         an inference engine,

·         and other proprietary search tools  and algorithms

 

How exactly does NetSapien work?

 

Spidering

The search begins by using client-specified data to constantly search for sites with relevant topics.   Then it performs an ‘intelligent’ or deep crawl to look at each linked page.  Then an algorithm is applied to determine if a link should be further examined.  As the spider continues to search on the client specified topics, the ‘smarter’ it gets about the topic.

Filtering

This step deletes any broken links or previously searched pages from the results.

Prioritizing

This is where the inference engine steps in.  Here, algorithms are plugged in to the engine to prioritize the results based on the client’s business criteria.  The feature looks through all the pages for recognition of text, video, audio, hidden text, meta tags, and links to assess how revenue generation is to occur and the intent of the content.  It then groups the pages back into sites to show the most relevant examples of the site.

Extracting

NetSapien technology then extracts the relevant data from each page and enters it into a database according to the client’s business criteria.  Some questions may be:

·         Is this page generating revenue?

·         Is it domestic or international?

·         Does this target specific clientele?

The last step prepares the data for easy formatting and analysis, being constantly updated by a learning/feedback loop through the process.

 

Recent Applications

 Automotive Industry

 

 Challenge:

A major car manufacturer who sells through traditional distribution channels needed a mechanism to track and manage sales of parts and accessories over the Internet, with an emphasis on locating unauthorized distributors.

Solution:

Cyveillance identifies sites selling or distributing this client's parts and accessories via the Internet. This feedback enables the manufacturer to manage distribution channels and price points, maintain the quality of their product sold over the Internet and maximize their e-Business objectives.

 

 

Computer Industry

 

Challenge:

A leading computer manufacturer's Web site sells $15 million in merchandise daily. However, this volume has spurred other sites to emulate the look and feel of the client's popular Web site, in an effort to divert traffic and revenue that belongs to our client.

Solution:

By identifying unauthorized sites diverting eyeballs and selling products, Cyveillance has prevented millions of dollars in revenue leakage each day through unauthorized sales of our client’s products.

 

Music Industry

 

·         Music Distribution

 

Challenge:

The leading recording industry organization sought a way to track and prioritize the thousands of sites that offer nearly half a million MP3 files, compressed music files typically available for free download.

Solution:

Cyveillance provided the client with a system that consistently locates and prioritizes sites containing large numbers of MP3 files,

thus enabling them to effectively manage the situation. By using Cyveillance, the association has halted the equivalent of more than 7,000 downloadable music CDs.

 

·         Music Licensing

 

Challenge:

A top music licensing organization sought an efficient, cost-effective way to identify and collect licensing fees from commercial or promotional sites streaming music owned by its members. Given the large and ever-increasing number of sites on the Internet, undertaking this task manually was not a feasible option for the organization.

Solution:

Cyveillance teamed with the client to develop a custom application of Cyveillance’s technology to handle the task. Today this powerful technology continuously scours the Internet to identify specific song titles of works being performed on Internet sites. The technology then prioritizes the sites based on client criteria and has identified more than $6 million in music licensing opportunities.

 

New Media Industry

 

Challenge:

The Web site of a national media company contains proprietary content that is not available for distribution. Because this site employs an advertising e-Business model, lost or diverted traffic means lost ad revenue.

Solution:

Every month, Cyveillance enables the client to reclaim thousands in ad revenue that would otherwise be lost, by identifying sites stealing their proprietary content. Additionally, Cyveillance provides the publishing company with a proactive means of protecting its proprietary content.

 

Pharmaceutical Industry

 

Challenge:

A major pharmaceutical manufacturer needed a way to identify the misuse of its domain name, trademarked drug name and any use of drug's name in ways not consistent with its corporate policies.

Solution:

Cyveillance has identified several hundred sites misusing the drug's name within the domain, in the context of promoting an herbal alternative and on sites selling placebo versions of that drug. Our work has not only boosted revenues for the client, it has also significantly decreased their risk of liability and brand dilution.

 

                     

Retail Industry

 

Challenge:

A renowned retailer recently set forth a policy stating that they would be the only site authorized to sell their product over the Internet. They sought a proactive method of implementing their policy, thus controlling unauthorized distribution.

 

Solution:

Cyveillance identifies and prioritizes sites selling their products so that they can effectively capture that revenue stream and prevent cannibalism of their off–line distribution channels.

 

Explanation and Profile of the Technology

Since NetSapien is a proprietary technology, we cannot gather information to determine the algorithms used to generate such relevant and useful data.  But of the technologies employed, the inference component of the search engine stand out as the piece of the process (along with the unspecified algorithms) that allows the technology to be more useful than other search engines. After all, it is the inference engine that is gauging the intent of the online messages being sent.

In explaining the technology, it is difficult to say how the engine is constructed.  For this reason general explanations on what inference engines are (otherwise known as active logics), some constraints placed on artificial intelligence, and general model on how the actual inference takes place (abduction versus deduction).

 

Active logics (Inference Engines)

 

Active logics are a family of inference engines that incorporate a history of their own reasoning as they run. At any time T, an active logic has a record of its reasoning at all times prior to T.  It also knows that the current time is T. As it continues to reason from time T, that reasoning is also recorded in the history, and is marked at time T+1 as having occurred at time T. Thus an active logic records the passage of time in discrete steps, and the "current" time slides forward as the system runs. It is convenient to regard its current inferences as occurring in a working memory, that is then transferred to the history (or long-term memory) in the next time-step.

 

The key aspect that makes such logics different from traditional temporal logics and from simple archival "dumps": in active logics the current time is itself noted in the working memory-Now (T)-and this changes  to Now(T+1) one step later. (A time-step should be thought of as very fast, perhaps 0.1 sec in correspondence with performance of elementary cognitive tasks by humans). Thus active logics "ground" now in terms of real time-passage during reasoning.

 

Some Problems with Traditional Artificial Intelligence (AI)

 

Critics of AI often remark that AI programs are "stupid"-they do not "really" understand anything, and thus are easily thrown into disarray and made useless. To some extent this criticism is well-taken: most AI   break down when conditions vary even slightly outside of defined bounds. A "smart" agent should be flexible enough able to take in stride many kinds of incoming information: contradictions, nonsense, change of topic, ambiguity, and so on. Yet when the defined bounds are violated, systems tend not to be able to provide reasonable behaviors, such as recognizing that they cannot correctly parse the input, or that a contradiction has occurred, or that a belief must be revised.  This is remedied by the NetSapien technology in the filtering step, and by the application of human intelligence (in the form of the business analyst).

 

Inference by abduction versus deduction

Abductive inference is the process where it is concluded from the rule A to B and the observation that B is true, that A might have caused B to be true. It is an approximate inference, meaning that abduction, in contrast to deduction, is not sound. Engineering design and configuration are likely to be abductive rather than deductive . It is the task of finding a structure given a functional specification. An example could be when a designer tries to realize a function F1 for the artifact to be designed, and his design knowledge tells him that a component C1 beside others realizes F1 i.e. C1  F1, then the designer concludes by abduction to select C1. If in later design phases inconsistency occurs, then he probably replaces C1 by another component, which can also realize F1. Abductive reasoning can also be found in the area of medical diagnostics. Given rules in the form disease symptoms, then the doctor concludes by abduction a disease because of the observed symptoms.

 

In order to build an abductive inference engine we need a component that is responsible for the generation of the hypotheses and a deductive inference engine (such as the client specifications at Cyveillance). First a hypotheses H (component or disease) must be generated to account for the fact F (function or symptoms) will be generated. The deductive inference engine then tries to prove F on the basis of H. To construct such an abductive inference engine, the problem is not generating the hypotheses and testing their validity by deduction, but rather,  finding valid hypotheses in a controlled manner to deal with the enormous search space of possible hypotheses in the field of design or diagnosis such as the vast content of the Internet.

 

 

Identification of major players/users (Links)

 

Cyveillance's clients include Bell Atlantic, Dell Computer Corp., Levi Strauss & Co., Mobil Corporation, Time Inc.-New Media, Washington Post, Newsweek Interactive, Bell South, ASCAP and the RIAA, in addition to leading companies in the pharmaceutical, financial services and computer industries, among others.

 

 

Assessment of limitations and potential

 

One of the limitations of the NetSapien Technology is data context.  Data context is a  problem in any search agent which results from the inability to distinguish the context in which the terms searched for are being used.  Another similar problem is the inability to perform qualitative comparisons.  With the exponential growth of the Internet and the limitations on search speed and thoroughness, search agents will have a tough time keeping up.   Furthermore, as the information continues to grow, the amount of people it will take to digest and analysis all the returned information will grow as well.

 

 Another limitation is the ability to improve the current spiders or search agents.  Although the technology is patented, it will become obsolete as new algorithms are developed along with new search techniques.  Some current competitors include Digimarc, Inforian's Quest, Agent Technologies' Copernic

 

Digimarc is currently developing a fundamentally new way to access and use the Internet by embedding imperceptible digital data in traditional and digital media. This includes printed materials such as magazine advertisements, articles, covers and subscription cards; direct mailers; packaging; debit and credit cards; greeting cards; coupons; catalogues; tickets; business cards; and digital content such as video, images and other creative properties in digital form. The embedded data creates a bridge between these materials and the Internet, permitting users to link directly to relevant Web destinations without any typing or mouse clicks. Our technology gives digital capabilities to physical media allowing new forms of interaction with the digital world and enhancing publishing, advertising and electronic commerce.

 

Several vendors offer tools that make searching for infringing material much easier than it used to be. Inexpensive applications such as Inforian's Quest and Agent Technologies' Copernic search multiple (up to 200) search engines, help gather and index searches, and simplify the entire monitoring process. These tools can locate anything from stolen code to copyrighted files being used across the Web.

 

 

Potential

 

 

History of the evolution of the technology.

 

The idea of robots as humanoid machines was first introduced in Karel Capek's 1921 play "R.U.R.," where the playwright conceived Rossum's Universal Robots. Sci-fi writer Isaac Asimov made them famous, beginning with his story I, Robot (1950) and continuing through a string of books known as the Robot Series.

 

On the Web, robots have taken on a new form of life. Since all Web servers are connected, robot-like software is the perfect way to perform the methodical searches needed to find information.

 

Bots were not invented on the Internet, however. Robotic software is generally believed to have been created in the form of Eliza, one of the first public displays of artificial intelligence. Eliza is a computer programmer that can engage a human in conversation: Eliza asks the user a question, and uses the answer to formulate yet another question.  Artificial intelligence is an advanced form of computer science that aims to develop software capable of processing information on its own, without the need for human direction.

 

A Brief History of WebCrawler

http://webcrawler.com/Help/AboutWC/WCStory.html

 

Spiders

http://www.cs.indiana.edu/~rawlins/b669-webpages/nreed/spiders.html

 

 

What's a Bot?

http://bots.internet.com/bot/what_is_a_bot.html

 

 

When to adopt technology?

 

Aggressive e-Businesses invest large sums of money to get in front of thousands of eyeballs each day and drive them to their sites to buy, browse, learn and create customer loyalty. Firms are consistently faced with questions of:

·         How can I recapture diverted traffic?

·         Are other sites using browser magnets to lure traffic that might otherwise arrive at my site?

·         Are other sites emulating my site to diver customers away?

·         What site features are most prevalent in specific e-Retailing environments?

 

If firms are not sure whether they’re capturing all the eyeballs that should be visiting their site, buying their products or reading their proprietary content, then they are definitely candidates for the NetSapien technology.

 

Future development and expectations

 

The Internet is changing the entire landscape of business and is impacting the way many firms interact with each other and with consumers.  This new economy is projected to produce more than a trillion dollars in e-Business trade by the year 2003. The vast evolution of e-business to support more and more business models will unfortunately lead to more cases of Internet fraud and other illegal activities on the Web.  Given this, it is critical for companies to maintain a pulse on all activities surrounding their business that may pose potential threats. Hence the relevance of the NetSapien technology. In the future, we will see an evolution of such technologies as more firms enter this space with competing products. Cyveillance, being the first mover and creator of the NetSapien technology will continue to build on its experience in this area while expanding its product offerings. It will become more and more evident to companies employing

e-Business strategies that using technologies such as NetSapien offer a very immediate and enormous return on investment.