APPARATUS AND METHOD FOR PROTECTION OF COMMUNICATIONS SYSTEMS
Cross-reference to Related Application
This application claims the benefit of U.S. Provisional Patent Application No. 60/626,324, filed November 9, 2004, the entirety of which is incorporated herein by reference.
Field of the Invention
The present invention relates generally to a communications device, more particularly, to a proxy system for use in requesting content from a another communications system while protecting a user's communications system, e.g., computer system, from damage which can otherwise be inflicted during such a process and/or avoiding disclosure of user information, such as the user's identity, the user's personal information and tracking of network sites visited by the user. The present invention also relates to methods for a method of obtaining content from a communications system, more particularly, for obtaining content while protecting a user's communication system and/or avoiding disclosure of user information.
Background of the Invention
Modern society has become increasingly dependent on communications networks, such as the Internet, for shopping, for entertainment and as a source of information. Modern businesses also rely heavily on their communications networks for making purchases and for obtaining up-to-date technical information, as well as for the storage and retrieval of non- public (classified and company sensitive) and employee personal information.
There are often a number of security related issues on a communications network, one of the most prevalent being the potential for compromise or infiltration by third parties. One method by which a third party could infiltrate a company communications system is by a connection to an external computer on a communications network, such as the Internet. For instance, by attaching undesired or spurious content to a message, the third party might attempt to monitor the activities of a user or group of users for a benign or harmful purpose, or attempt to retrieve some specific information from the company communications system.
The spurious content may include data mining or spy software (spyware), malicious software (malware) or advertisement software, including pop-up advertisements (adware).
Brief Summary of the Invention m accordance with a first aspect of the present invention, there is provided a method of filling a request for content from a communications system, comprising: receiving in a proxy system a client request for content from at least one communications system; removing from the client request information identifying the client to produce an anonymous request; adding to the anonymous request information identifying the proxy system to produce a proxy request; sending the proxy request to the communications system; receiving response content from the communications system in response to the proxy request; removing undesired material (e.g., adware, spyware and/or malware) from the response content to produce filtered response content; and transmitting the filtered response content to the client.
The term "content", as used herein, means any kind of subject matter, e.g., material which can be accessed from a website (such as text, data, graphics, software, etc.). In accordance with a second aspect of the present invention, there is provided a method of filling a request for content from a communications system, comprising: receiving from a client a client request for content from at least one communications system; removing from the client request for content information identifying the client to produce an anonymous request; adding to the anonymous request information identifying the proxy system to produce a proxy request; sending the proxy request to the communications system; receiving response content from the communications system in response to the proxy request; and
transmitting at least a portion of the response content to the client.
In accordance with a third aspect of the present invention, there is provided a method of filling a request for content from a communications system, comprising: receiving from a client a client request for content from at least one communications system; sending at least a portion of the client request to the communications system; receiving response content from the communications system in response to the proxy request; removing undesired material from the response content to produce filtered response content; and transmitting the filtered response content to the client.
Ih accordance with a fourth aspect of the present invention, there is provided a proxy system comprising a request receiver, a request parser, a proxy system identification information inserter, a request transmitter, a response receiver and a filtered response transmitter. The request receiver receives from a client a request for content from at least one communications system. The request parser removes from the request for content information identifying the client to produce an anonymous request. The proxy system identification information inserter inserts information identifying the proxy system into the anonymous request to produce a proxy request. The request transmitter transmits the proxy request to the at least one communications system. The response receiver receives response content from the communications system. The filtered response transmitter transmits at least a portion of the response content to the client.
Preferably, the proxy system further comprises a content filter which removes undesired material from the response content to produce the portion of the response content which is transmitted to the client by the filtered response transmitter.
Preferably, the content filter, the proxy system identification information inserter and/or the request parser comprise at least one text processor.
In accordance with a fifth aspect of the present invention, there is provided a proxy system comprising a request receiver, a request transmitter, a response receiver, a content filter and a filtered response transmitter. The request receiver receives from a client a request for content from at least one communications system. The request fransmitter transmits the
proxy request to the at least one communications system. The response receiver receives response content from the communications system. The content filter removes undesired material from the response content to produce filtered response content. The filtered response transmitter which transmits the filtered response content to the client. Accordingly, the present invention provides a proxy system (e.g., a computer or server) which provides the capability to protect the client's communications system (e.g., computer) from being infiltrated by data-mining or spy software (spyware), malicious software (malware), or advertisement software, including pop-up advertisements (adware). The invention also makes it possible to avoid the disclosure of the client's personal information and web browsing activity. The proxy system of the present invention can be used as an intermediary system which acts as a buffer between the client's communications system and the communications system from which content is being requested, presenting the client system's requests for information to the designated addressee and receiving the addressee's response. Another capability which can be provided by the proxy system of the present invention is the ability to remove the client's identifying information (e.g., the client's machine ID and/or the client's e-mail address) from a client's information request message before retransmitting the modified information request message to the designated addressee. When a response is received from the addressee, the proxy system provides the capability to search the response's source code (e.g., hypertext markup language (HTML) source code) for spurious web content, including spyware, malware and adware. The proxy system can remove the identified spyware, malware and adware from the received data stream before the data stream is retransmitted to the client. The proxy system can thus prevent spyware, malware and adware from reaching and/or harming the client's communications system, and can thus act as a buffer between the client's communications system and the communications system (e.g., Internet) by removing bothersome and/or potentially harmful items before these items can infiltrate the client's communications system.
Brief Description of the Drawing Figures:
Figure 1 is a diagram of a client's interface to the Internet without the present invention.
Figure 2 is a diagram of communications paths using a proxy system for Internet communications according to the present invention.
Detailed Description of the Invention
With the ever-increasing popularity of the Internet, more personal computers (PC's) provide Internet access through web "browsers" such as Microsoft Internet Explorer or
Netscape Navigator. The web browsers provide the user with the ability to access information available on the Internet. Users now frequently access the Internet using their employer's computer system for both business and personal reasons. The potential for business computers being infiltrated and possibly attacked by spyware, adware and malware is high and can cause a serious disruption in the company's business activities.
Typically, users accessing the Internet from a PC or workstation use hypertext transfer protocol (HTTP) as the communications protocol for this connection. Some of the basics of HTTP, presented in the following paragraphs, highlight the problems addressed by the present invention. Hypertext transfer protocol (HTTP), the communication foundation of the Internet, is the request/response protocol used on top of TCP (Transmission Control Protocol) that carries commands from a user's web browser to a web server and responses from the web server back to the user's web browser. HTTP is an application level protocol with the speed necessary for the distributed, collaborative, information systems of the Internet. HTTP has been in use by the World-Wide Web global information initiative since 1990. The data transferred by the HTTP protocol can be plain text, hypertext, audio, images, or any Internet- accessible information.
HTTP is a stateless transaction-oriented client/server protocol, in which every request from a client to a web server is treated independently. A typical implementation creates a new TCP connection between a client and a web server for each transaction, then terminates the connection as soon as the transaction completes. However, the protocol does not require this one-to-one relationship between transaction and connection lifetimes and the connection can be maintained to complete additional transactions. The transaction-based approach of HTTP is beneficial for normal web applications involving retrieving a sequence of pages and documents.
Web servers and clients (users) primarily communicate using two types of HTTP messages: request messages and response messages. A request message is sent by a client to a web server to initiate some action. Examples of HTTP request commands are presented in Table 1.
Table 1 HTTP Request Commands
In response to the HTTP request message, the web server replies with an HTTP response message. An HTTP response message may include an entity body containing hypertext-based information. In addition, the response message must specify a status code, which indicates the action taken on the HTTP request. HTTP status code categories are shown in Table 2 and some examples of HTTP status codes are shown in Table 3.
Table 2 HTTP Response Status Categories
Table 3 Examples of HTTP Status Codes
In a typical HTTP configuration, a client, using a web browser, initiates an Internet request message (HTTP request message) for a resource, for instance, from a web server where desired information is located. The client's web browser opens a direct TCP connection (i.e., point-to-point) between the client's web browser and the web server. After opening the direct connection, the client's web browser issues the HTTP request message. The HTTP request message consists of a specific command (referred to as a method), a URL, and a message containing request parameters, information about the client, and perhaps additional content information. When the web server receives the HTTP request, it attempts to perform the requested action and returns an HTTP response. The HTTP response includes status category and status code (success/error) information, and a message containing information about the web server, information about the response itself, and possible data (body) content. The TCP connection is then closed.
Whenever a web browser sends an HTTP request to a web server, the HTTP request contains information about the web browser which sent the request. The web browser may be disclosing personal information including: the computer being used, the PC's software and hardware levels, details of the web sites the user has visited, and possibly even the user's email address. Also, when the web server responds with an HTTP response message, a "cookie" is often sent to the web browser. The cookie is a message given to a web browser by a web server that acts as a unique identifier that a web server places on a user's computer and can be used to retrieve an individual's records from a web server's database. The web
browser stores the message in a text file and sends the cookie message back to the web server, as part of the HTTP request, each time the client's web browser requests information from the web server. Many organizations use "cookies" to track user's moves on their web site. A persistent cookie, one that is stored on a user's computer until it expires or the user deletes it, can be used to collect identifying information about the user, such as web surfing behavior or user preferences for a specific web site.
An aspect of the present invention is directed to preventing the disclosure or theft of the client user's personal information, including the client's identity, personal information, credit card information and web surfing activity. For example, in one aspect, the present invention provides an intermediary buffer between the client's computer system and the Internet, and removes client identifying and personal information from HTTP request messages before the HTTP message is forwarded to the web server. The invention's ability to hide the client's identity is useful to alleviate industrial espionage related data gathering of client web activity and provides anonymous web browsing capability for the client. Another increasing problem on the Internet is the undesirable spurious content that may be attached to the information requested by the user. As noted below, the spurious content may include malware, spyware, and/or adware.
Malware refers to software that is designed specifically to damage or disrupt a computer system, which includes viruses and Trojan horses. Malware is frequently delivered to a user's PC through pop-up windows that appear when a user accesses a web site or
Internet access portal. Frequently, advertisements being run from third-party advertisement web servers appear as pop-up advertisements running on a web site. A recent data-stealing malware problem used popup advertisements from a third party ad web server that the malware creators had compromised (hacked) and exploited vulnerability in the Internet Explorer web browser. When the pop-ups appeared, vulnerable versions of Internet Explorer began downloading a malicious file that recorded activity including user passwords from the infected PC and sent the data back to the malware creator's web server. In this manner, the malware creators were able to steal banking and credit card information, using Internet Explorer as their gateway. Another form of malware is a Trojan horse, a program that masquerades as a legitimate program and damages or compromises the security of the computer system. Some
Trojan horse programs perform malicious acts - including capturing what is on a computer's screen and what is typed in using the keyboard. Trojans can also be used to remotely control PC devices or to set up FTP, HTTP or Telnet servers on an unsuspecting user's machine. Spyware is similar to malware (Trojans), in that users may unknowingly install a harmful program when installing some other desired program. Spyware refers to any software that covertly gathers information about the user through the user's Internet connection without the user's knowledge, usually for advertising purposes. Spyware applications are sometimes bundled as a hidden component of freeware or shareware programs that can be downloaded from the Internet. Once installed, the spyware monitors user activity on the Internet and transmits that information in the background to a third party. Spyware can also gather information about e-mail addresses and even passwords and credit card numbers, similar to malware.
Spyware adversely impacts a computer system's performance by using up the memory resources available on the user's PC and also by using part of the user's available Internet connection's data transfer capability (bandwidth) as it sends information back to its creator. Also, because spyware is running in the background, using memory and system resources, spyware can cause general system instability and lead to system crashes. Malware is designed to disrupt the operation of the computer and frequently cause system lockup or crashes.
Additionally, there is the ever-increasing use of advertising material on practically any content which a user can access. This may be especially problematic for businesses since advertising material is often graphically intensive, thus tying up computer resources for a substantial amount of time for downloading and processing. Advertisements, especially pop¬ up advertisements, use adware software to display the advertisements to computer users. Adware also reports the user's web surfing habits back to its home web server, which then sends the user advertisements targeted according to their reported data. While adware is not usually malicious, it uses computer memory and bandwidth resources like spyware. The impact of malware, spyware and adware infiltrating and infesting business computer systems can be severe, manifesting their presence in loss of data, loss of computer resources (crashes), data security issues and loss in productivity. This problem is also addressed by the aspect of the present invention that acts as an intermediary buffer between the client's computer system and the Internet, protecting the
client's computer system from being attacked by the spurious content on the Internet. The Internet proxy system can also be used to enhance security where a company intranet is protected by a firewall. The client company can restrict an employee's Internet access by requiring the firewall to validate a connection to the Internet proxy system before allowing the connection to be established.
As mentioned above, the present invention discloses proxy systems which comprise a plurality of components selected from among a request receiver, a request parser, a proxy system identification information inserter, a request transmitter, a response receiver, a content filter and a filtered response transmitter. The request receiver receives from a client (e.g., a personal computer or workstation) a request for content from at least one communications system. Such a request receiver can comprise, for example, a computer server, workstation or other communications device capable of receiving and handling a web browser generated message or any other standard Internet communication protocol formatted message received from any other communications system, which includes an FTP, Telenet or radio frequency (RF) connection.
The request parser removes from the request for content information identifying the client to produce an anonymous request. Preferably, the request parser comprises a text processor. A variety of text processors are well known by those of skill art, any of which can be employed in connection with the present invention. The proxy system identification information inserter inserts information identifying the proxy system into the anonymous request to produce a proxy request. Preferably, the proxy system identification information inserter comprises a text processor. A variety of text processors are well known by those of skill art, any of which can be employed in connection with the present invention. The request transmitter transmits the proxy request to the at least one communications system. Such a request transmitter can comprise, for example, a web browser or similar application program running on a computer server, a laptop or desktop personal computer, a handheld computer, a personal digital assistant, a cellular phone or other web-enabled communications device. A request transmitter can also be implemented using an FTP, Telenet or RF connection.
The response receiver receives response content from the communications system.
Such a response receiver can comprise, for example, an internet computer server, workstation or other communications device capable of receiving and handling a web server generated response message or any other standard Internet communication protocol formatted message received from any other communications system, which includes but is not limited to an FTP, Telenet or radio frequency (RF) connection.
The content filter removes undesired material from the response content to produce filtered response content. Preferably, the content filter comprises a text processor. A variety of text processors are well known by those of skill art, any of which can be employed in connection with the present invention. The filtered response transmitter transmits the filtered response content to the client.
Such a filtered response transmitter can comprise, for example, a web browser or similar application program running on a computer server, a laptop or desktop personal computer, a handheld computer, a personal digital assistant, a cellular phone or other web-enabled communications device. A filtered response transmitter can also be implemented using an FTP, Telenet or RF connection. Preferably, a filtered response transmitter can communicate the response to the client in a manner similar to that in which the request was received.
The proxy system may optionally further comprise an undesired material information receiver which collects information regarding adware, spyware and/or malware. Such an undesired material information receiver can comprise, for example, one or more application programs, which further analyze the identified adware, malware and spyware. The application programs can include functions to allow manual or automatic update of the proprietary database with the analysis results. The proxy system can use intelligent data analysis programming to control database updating.
By preventing spyware, adware and malware, as well as web-based advertisements, including pop-up advertisements, from infiltrating the client's computer system, a proxy system according to the present invention enhances the client's computer system performance by saving memory space and computer system speed and reduces or eliminates the risk of a computer virus infecting the client's system, thereby avoiding lost productivity.
The Internet stores an enormous amount of information, largely as a collection of interconnected web or HTML pages, which users can access. The majority of the content available on the Internet is represented in HTML documents that can be read or accessed by
web browsers. Hypertext Markup Language (HTML) is the scripting language used to create documents for the Internet. In general, software developers use markup languages to describe the structure of the document. The web browser reading the HTML document interprets the markup tags or commands to help format the document for subsequent display to a user. The web browser thus displays the document with regard to features that the viewer selects either explicitly or implicitly. Factors affecting the layout and presentation include, for instance, the markup tags used, the physical page width available, and the fonts used to display the text. Using HTML provides the developer advantages including the ability to create "hypertext links" to other documents. HTML documents require an ordered sequence of standard HTML tags to be correctly interpreted by web browsers. Ih compliance with HTML and standard generalized markup language (SGML) specifications, the web browser's programming expects this specific sequence of tag information to properly display an HTML document to the user accessing them. Each HTML document consists of head and body text. The head contains the title, and the body contains the actual text that is made up of paragraphs, lists, and other elements.
HTML marks the various elements in a document, including headings, paragraphs, lists, and tables. Elements can contain plain text, other elements, or both. An HTML document also includes formatting commands or "tags" embedded within the text of the document that serve as commands to a web browser. The basic layout of an HTML document, including required elements, is illustrated by the following simple HTML document.
<HTML> <HEAD>
<TITLE> Title of a web page </TITLE> </HEAD> <BODY> This is an example of a simple web page.</BODY> </HTML>
As illustrated, the required elements of an HTML request message include the <HTML>, <HEAD>, <TITLE>, and <BODY> tags, together with their corresponding </HTML>, </HEAD>, </TITLE>, and </BODY>end tags. The content of the HTML request message element is the text or data located between the first tag and the corresponding end tag (i.e., <HEAD> </HEAD>. The HTML tag pair's function is listed in Table 4.
Table 4 HTML Tags Function
The proxy system according to the present invention can be used with any type of communications system. In one aspect of the present invention, a proxy system is used in connection with accessing content from the Internet by clients using a personal computer (PC) or workstation operating a web browser, such as Netscape Navigator or Microsoft Internet Explorer. The client PC can be operating standalone or as part of an intranet. For example, the client may access the Internet by establishing a TCP connection using the HTTP request-response protocol on the Transmission Control Protocol/Internet Protocol (TCP/IP). Typically, to conduct research on the Internet, a user uses a web browser to establish a point-to-point connection on the Internet. The web browser issues an HTTP request message to access individual web sites and retrieve desired information. An HTTP Request or Response message uses a generic message format, defined in Table 1, for requesting data and transferring the data requested.
Table 5 HTTP Message Fields
When the client connects to the Internet, the client's computer is vulnerable to infiltration or possibly attack by third parties, as depicted in Figure 1. The source of a majority of the spurious content is the target web site because the TCP connection is poiiit-to- point. The remaining sources can usually be blocked using a firewall and are well known in the art and further description is not needed. The present invention uses a proxy system to prevent infiltration of the client's computer by spurious content that may be attached to the requested data.
Preferably, the client establishes a point-to-point TCP connection to the proxy system, as depicted in Figure 2, instead of establishing a point-to-point TCP connection between the client's PC and a web server containing the information desired by the client. In this manner, the proxy system acts as an intermediary buffer between the client's computer system and the Internet web server, shielding the client's computer system from disclosing client specific data, personal information or even the client's web surfing activity. To accomplish this shielding, the proxy system first receives the request message from the client, as depicted by item (1) in Figure 2. The proxy system then "deconstructs" or "parses" the HTTP request message received from the client, using a text processor program. The text processor program searches and identifies the web server designated to be the recipient of the message and HTTP request message status and content. The identified information is transferred to a new HTTP request message that contains the proxy system's identifying information. The proxy system then transmits the modified version of the client's HTTP request message as an HTTP request from the proxy system to the designated communication system (e.g., web server), as depicted by item (2) in Figure 2. The transmission of the modified HTTP message
modifications prevents disclosure of client information or web browsing activity, providing the client the ability to "surf the Internet anonymously.
Preferably, the client is required to "log in" to the proxy system that verifies the client's identity and subscription service status before permitting the client to proceed. Such a client authentication can comprise a basic log-in name and password authentication procedure or the use of a certificate (a digital credential) which is used for establishing a Secure Socket Layer (SSL) connection. Additional authentication options include the use of an encryption device, the use of an encryption program with a user public key, such as PGP, or more elaborate procedures such as RSA SecurID, a random password generator, which can be implemented on the client side or proxy server side.
When the designated communications system (e.g., web server) responds with an HTTP response message containing a cookie, as depicted by item (3) in Figure 2, the proxy system preferably responds to any information requested with the proxy system's information and stores the cookie in a text file in a temporary memory area that is deleted when the HTTP response completing the transaction has been received. By acting as a buffer between the web server and the client's computer system, the proxy system limits the cookie to reporting the proxy system's information, (i.e., computer being used, hardware and software levels), preventing the client's computer system information from being disclosed.
When the designated communications system (e.g., web server) sends an HTTP response with the client requested data, as depicted by item (4) in Figure 2, the proxy system receives the HTTP response as a stream of data from the web server. After receiving the data, the proxy system preferably first "deconstructs" or "parses" the received HTTP response message, using a text processor program. The text processor searches the received data stream for valid HTML tag pair by initially locating a valid "<" symbol, denoting the beginning of an HTML tag pair. After locating the initial valid left side "<" symbol, the proxy system then removes the HTTP response header information which includes the information preceding the first valid left side "<" symbol. The proxy system continues searching the HTTP request message for the corresponding right side of the tag (i.e., ">" symbol). After locating a complete tag pair, the proxy system allocates memory for the data contained by the tag pair, including the tag delimiters.
After identifying the valid HTML tag pairs, the proxy system then preferably searches the data contained in each individual tag pair combination for any embedded HTML tag pairs within each identified pair.
After the HTML tag pairs have been identified, the proxy system preferably searches the content defined by each HTML tag pair for text strings indicating the presence of malware, spyware or adware (e.g., Internet advertisements, including pop-ups). The proxy system preferably compares the text contained within the HTML tag pairs against the proprietary database of malware, spyware or adware resident on the proxy system.
Discovered malware, spyware or adware (e.g., Internet advertisements, including pop- ups) is preferably removed by the proxy system. Preferably, the removed malware, spyware or adware is replaced with a null data set as a placeholder of equal size. The proxy system preferably reformats the remaining data from the original HTTP response message and the portions of the original HTML response now containing null data to properly display the remaining data to the client. Preferably, the proxy system reformats the data remaining in the received HTML response message to facilitate viewing by the client.
When the proxy system has completed the removal of the undesired material and the reformatting of the received HTTP response message, the modified version of the HTTP response message is transmitted to the client's computer system, as depicted by item (5) in Figure 2.
To enhance the response time of the proxy system, a client is preferably allocated a portion of the memory available on the proxy system when a connection is established between the client's computer system and the proxy system. This allocated memory is used to store (e.g., cache) HTML web pages accessed by the client. After the proxy system has modified the HTTP response message, removing malware, spyware or adware, the proxy system copies the modified HTTP response message's data to the client's allocated memory cache. When the client sends another HTTP request message for the same web page, the proxy system sends an HTTP message to check the date of the file on the web server and compares the date against the date of the cached file. If the date is the same, the proxy system retransmits the cached file to the client instead of downloading it again from the web server. The use of cache for frequently accessed web pages increases proxy system response time to
client requests. The web pages can be retained in the allocated memory either for a fixed period of time or until the connection between the client and the proxy system is closed. Preferably, the proxy system maintains the client accessed web pages for 2 hours.
Additionally, to further enhance response time, the proxy system preferably automatically caches frequently accessed web pages, such as the CNN home page. The proxy system preferably periodically updates such frequently accessed web page content. Preferably, the proxy system updates the common use web pages every repeated time interval, e.g., every 20 seconds.
Any two or more parts of the proxy systems described herein can be integrated. Any structural part of the proxy systems described herein can be provided in two or more parts. Similarly, any two or more functions can be conducted simultaneously, and/or any function can be conducted in a series of steps.