Overview of the Internet and the World-Wide Web

(This material last modified

Internet Overview

First computers (even PCs) were stand-alone machines

To transfer information from one computer to another required transporting cards, tape, disk, diskette, ....

Better way to share information is to create connection

Local Area Network (LAN) with 2 or more computers sharing information

Unexpected result -- electronic mail (email)

Connect networks to make network of networks ....

A network can withstand failures of some connections and still continue to operate

Research and development projects in 1980s sponsored by National Science Foundation (NSF) and Department of Defense (DOD)

"Internet" is short for "Interconnected Network of Networks"

A "protocol" is a standardized method -- in this case for communication

Internet transfers messages (information) as packets of characters (bytes)

Computer that is directly connected to Internet is called "Internet node"

Each Internet node has software that transfers information using the Internet Protocol (IP)

Most (but not all) Internet nodes have IP Host Name --
ocean.cs.purdue.edu
indynet.indy.net
inet.d48.lilly.com
alw.nih.gov
blackbird.afit.af.mil
philadelphia.libertynet.org

Each Internet node MUST have IP Host Number --
128.10.2.28
199.3.65.1
40.33.1.1
128.231.128.7
129.92.1.2
128.91.1.141

At end of IP name...
Commercial (.com)
Education (.edu)
Government (.gov)
Military (.mil)
Internet (.net)
Non-profit organizations (.org)

Outside the US, different naming schemes. Typically last two characters of IP name identify country (toronto.cbc.ca, mitsubishi.co.jp, club.eng.cam.ac.uk)

The .edu domain is for "colleges and universities".

The .us domain is used for "technical schools and pre-college schools" (among other state-related domains).

The .us domain is predominantly used by state government agencies, K-12 schools, community colleges, technical/vocational schools, and private schools

For example, the Lafayette Indiana School Corporation's Website is at http://www.lsc.k12.in.us

Transmission Control Protocol (TCP) and Internet Protocol (IP) software provide communications -- TCP/IP

World-Wide Web Overview

World-Wide Web provides Internet users with means to access variety of media in simplified fashion

Internet refers to the physical side of the global network plus TCP/IP software

Web refers to a body of information - - abstract space of knowledge

CERN -- collective of European high-energy physics researchers

CERN (Conseil European pour la Recherche Nucleaire) members located in number of different countries

In 1989 Tim Berners-Lee (from England) proposed global hypertext (Web) project

Documents can have links (connections) to other documents and other things

Hypertext links (hyperlinks) can create a complex virtual web of connections

http://www.wunderground.com/index.html

Possible to represent nearly any file or service on the Internet with a URL

First part of URL (before two slashes) specifies method of access

Second part -- typically address of computer where data or service is located

Further parts may specify name of file, port to connect to, or text to search for in document

http://www.wunderground.com works because index.html is the default file

Client-server paradigm

Server program sends copies of documents on request

Requires computer on Internet and server software always running

Client program sends message to server to request copy of document

Clients and servers communicate via TCP/IP

Client and server may establish "persistent connection" so that all pages after first arrive more quickly

Operating Systems

Operating system nothing more than a program which manages things done by the computer

Computers (nodes) on the Internet use several different operating systems: Microsoft Windows (95, 98, 2000, Millennium, NT, XP), Unix, Linux, Macintosh

Operating system provides user interface, a way for the user to interact with the operating system

Graphical User Interface (GUI)

Command line interface

Web browsers are available for each kind of operating system

Internet was built using UNIX machines

Editors

Word, WordPerfect, and similar editors store as binary, but can often save as HTML text files

Notepad, Wordpad, and similar editors create text files -- useful as Web files

File Transfer Protocol (FTP)

FTP used to transfer files from one computer on the Internet to another

Often useful for building Web pages on one computer and FTP'ing them to computer that has Web server

Telnet and SSH

In days of stand-alone computers, in order to execute instructions on a computer, user had to be physically present

Telnet protocol developed in 1980 provides for remote access

With telnet can type in Unix commands to one computer from window opened on another computer

Secure Shell (SSH), 1995, designed to take the place of Telnet

SSH implements Telnet in a secure manner; operates over TCP and provides strong authentication and secure communications

Electronic Mail

Email address consists of

user-id@IP-Host-Name

fkjohnson@prozac.lilly.com

Mail tools (Elm, Pine, Pegasus, Eudora, Netscape, Microsoft Outlook) major features:

Create message

Send message to one or more people

Send "courtesy copies" (cc) to one or more people

Send "blind courtesy copies" (bcc) to one or more people (usually yourself)

Read message

Save message (usually in "folders")

Forward message to someone else

Reply to message

Aliases for often-used email addresses --
johnson: fkjohnson@prozac.lilly.com

Aliases for groups of email addresses --
officers: dlwilson@prozac.lilly.com, brsmith@acctg.lilly.com, fkjohnson@prozac.lilly.com, dunsmore@purdue.edu

Delete message

Mail Servers

Simple Mail Transfer Protocol (SMTP) -- sends email messages from your machine to outside world

Post Office Protocol (POP) -- download incoming email messages to local machine

Server erases any messages that have been downloaded to client machine

Best to use if you always read email from same machine

Internet Message Access Protocol (IMAP) -- keep messages on IMAP server

Best to use if you read email from various machines

Mail Guidelines

Make the Subject line useful

Avoid "Hi" or "Hello"

Decide on subject after writing email message by choosing most relevant phrase

To: Martin Noreke <noreke@purdue.edu> Subject: You can import Excel data into PowerPoint Date: Thu, 06 Jul 2003 15:08:52 -0500 From: Buster Dunsmore <dunsmore@purdue.edu> Martin, You can import Excel data into PowerPoint by LINKING to the original data in Excel. If you update the linked data in Excel, you have also changed the PowerPoint version as well. Buster =========

If you reply to a message, most mail tools will automatically create a Subject line for you

Subject: Re: You can import Excel data into PowerPoint

Be sure to change that Subject line when the conversation drifts away from the original subject

Check your spelling

Use good grammar

Use (but don't overuse) *asterisks* or CAPITAL LETTERS for emphasis

Include relevant lines from the other person prefaced by > in your reply...


> Do you think we should have the
> meeting at 2:00 today or would it be
> better to have it tomorrow?

I would prefer tomorrow.

No Flaming -- Don't say anything via email that you would not say face-to-face in front of your mother

"Sign" your email message at least with your name followed by a line of dashes, underscores, or equal signs...

Betty Williamson
==================

Make sure to "sign" the email message with your real name ... especially if your email address is NOT your real name

An unsigned email message from bigstinky@hotmail.com is like a phone call from someone who refuses to identify himself

(The "signing" rule above may be relaxed if you are certain the recipient knows who you are....)

Keep signatures to 6 lines or less...

==============================
Dr. H. E. Dunsmore            
Department of Computer Science
Purdue University             
West Lafayette, Indiana 47907
==============================

Mail Attachments

Extended email to include other types of media beyond plain text

Images, sounds, animations, word processing files, etc.

Modern email clients use: MIME (Multipurpose Internet Mail Extensions) (also known as Base64)

File is encoded to characters for mail transport

File extension identifies what it is...

.txt = text
.gif, .jpg, .jpeg, .bmp = image
.exe = executable program
.doc = Word file
.htm, .html = HTML file

Avoiding Virus Infections

Virus is a program usually disguised as something else

Causes some unexpected and usually undesirable event

Virus often designed to spread automatically to other computer users

Effects range from just sending virus to others, sending files to someone else, deleting some files, to completely disabling computing system

Viruses can be transmitted as email attachments, downloaded programs from Web, or on diskette or CD

User that virus came from may be oblivious that he or she has it

Some viruses work immediately

Others are "time bombs" until date or situation causes virus to go to work

Some protection by only opening attachments and programs from trusted individuals and organizations

More protection from anti-virus software that screens email attachments and checks files periodically and removes any viruses found

No completely fool-proof way to avoid viruses

Be careful about .exe, .doc, and .vbs attachments or any such files given to you by any means

Symantec Security Website

Hoaxes

Virus hoax is false warning about a computer virus

Typically warning arrives via email with well-meaning people forwarding it to others ad nauseam

Recent hoax warned people to look for file named Sulfnbk.exe and to delete it (bad virus!) immediately. Sulfnbk.exe is actually standard Windows system file. If deleted, Windows will not work.

If you get message about virus, check it out by going to one of the leading virus Web sites

Hoaxes are not limited to viruses. Internet and Web are full of mis-information.

Problem -- anyone can publish anything via email and Websites. Appearing on computer screen gives it air of authenticity.

Net Hoaxes

Removal from Mailing Lists

Request removal from "reputable" mailing list

No response is often best way to be removed from junk mail or undesireable mailing lists

Spam

Spam = junk email

Now greater than 50% of all email

Spammers gather email addresses by purchasing mailing lists and scanning web pages and newsgroups

Spammers rarely remove an email address from their database upon request

Request confirms that email address actually exists and is being used

Some spam filtering may be done by ISP

Further filtering can be done by user

SpamAssassin examines each message to detect if it is a junk email message

Netscape email tool places messages in Junk folder; learning software

Web Browsers

Web browsers began at National Center for Supercomputing Applications (NCSA), University of Illinois

NCSA's Software Design Group -- produced versatile, multi-platform interface to World-Wide Web -- called it Mosaic

Mosaic was created during a four-month period in late 1992 and early 1993 by Marc Andreessen and some other students

Due to easy, point-and-click hypermedia interface, Mosaic set standard for Web interfaces

Marc Andreessen left University of Illinois to become Vice President for Technology of Netscape Communications

First version of Netscape browser available October, 1994

Big improvement -- Continuous document streaming, enabling users to view documents while they were still being downloaded rather than waiting for the entire document to load

Netscape has been responsible for advances in HTML

Microsoft's Internet Explorer also based on Mosaic browser

Developed specifically for use with Windows 95 operating system

Cookies

"Cookie" is little morsel of information sent from Web server to Web browser (usually during first visit)

After browser receives cookie, whenever requests Web page from that server, sends back that cookie

Shopping applications can store information about past (or currently) selected items

Fee services can send back registration information -- freeing client from typing IDs and passwords

Sites can store user preferences on client

Cookie might be...

.gap.com,shirts,pants

Why keep cookie at browser instead of server? Faster!

Cookies can be modified, removed, expired

Cookie limit -- cookies used least recently removed

Search Engines

Robots, Crawlers, Worms, and Spiders

"Crawl around and find what we can"

Continuously running program (robot, crawler, worm, spider) pursuing hyperlinks throughout Web

Start with set of documents. Identify new places to explore by looking at outbound links. Visit those links. Index most useful terms.

Problems with Search Engines --

Effective descriptive phrases

Search Engines show Web as it was, not as it is

Links that don't work or have been moved

Streaming Audio/Video

At beginning of Web, Audio/Video files were "static"

File had to be downloaded to browser in its entirety before playing began

Such files could be very large. Delay could be lengthy.

Streaming audio/video begins playing almost immediately and can continue indefinitely

Can be used for "live" events

Streaming audio/video player required. Can be launched by browser or embedded in browser window.

RealAudio, LiquidAudio, Streamworks, Shockwave Audio, RealVideo

Plug-Ins

Some things can be displayed by (outside) browser assisted by helper software (plug-ins)

Java applets

Background sounds

Macromedia Flash images and video

Adobe Portable Document Format (PDF) -- requires Adobe Acrobat software

Document can be converted to PDF format while still maintaining original fonts, images, and layout

PDF very popular for email transmission of documents as well

Guidelines for Good Websites

Homepage of Website should be approximately one screen -- like cover of magazine

Links to subpages

Be consistent among pages at Website

General rule -- don't make long pages

Each page has link (buttons or images are nice) back to homepage

Each page has "last changed" date

Judicious use of background images and text colors

Judicious use of images

Design for multiple browsers

Laws, ethics, and good taste

Same laws as for publishing -- only publish what you own or have obtained permission to use

Can link to any Website, but do not steal material

Reference things taken in part from other sites and give URL

Avoid items in poor taste -- that you would not show to your mother