Content Management Glossary

Internet Basics
Protocol
A set of rules that lets computers agree how to communicate over the Internet.

TCP/IP
(Transmission Control Protocol/Internet Protocol) -- This is the suite of protocols that defines the Internet. Originally designed for the UNIX operating system, TCP/IP is now natively supported on all major platforms.

Internet
The Internet is a collection of networks that provide worldwide connectivity based on the IP protocol. World Wide Web servers and browsers are among the most popular applications on the Internet.

Intranet
An intranet is a privately owned network that makes use of Internet technology and applications to meet the needs of an enterprise.

Extranet
A network connection to a partner's network using secure IP and other Internet protocols to do business.

FTP
(File Transfer Protocol) -- A common method of moving files between two Internet sites. FTP is a special way to log in to another Internet site for the purposes of retrieving and/or sending files. Publicly accessible repositories of material can be accessed via FTP by logging in using the account name "anonymous". These sites are called anonymous FTP servers.

HTTP
(HyperText Transport Protocol) -- The protocol for moving hypertext files across the Internet. Requires a HTTP client program on one end, and an HTTP server program on the other end.

Host
Any computer on a network that is a repository for services available to other computers on the network. It is common to have one host machine provide several services, such as WWW and USENET.

Domain Name
The unique name that identifies an Internet site. Domain names always have two or more parts, separated by dots. The part on the left is the most specific, and the part on the right is the most general. A given machine may have more than one domain name but a given domain name points to only one machine. For example, the domain names:
edit-x.com
mail.edit-x.com
workshop.edit-x.com
can all refer to the same host, but each domain name can refer to no more than one host.

URL
(Uniform Resource Locator) -- The standard way to give the address of any resource on the Internet that is part of the World Wide Web (WWW). A URL looks like this:
http://www.edit-x.com/seminars.html
Port
Often refers to a number that is part of a URL, appearing after a colon (:) right after the domain name. Every service on an Internet server listens on a particular port number on that server. Most services have standard port numbers, e.g. Web servers normally listen on port 80. Services can also listen on non-standard ports, in which case the port number must be specified in a URL when accessing the server, so you might see a URL of the form:
  http://www.edit-x.com:1080/index.html

Web Basics
GIF
(Graphic Interchange Format) -- A common format for image files, especially suitable for images containing large areas of the same color. GIF format files of simple images are often smaller than the same file would be if stored in JPEG format, but GIF format does not store photographic images as well as JPEG.

JPEG
(Joint Photographic Experts Group) -- JPEG is most commonly used as a format for image files. JPEG format is preferred to the GIF format for photographic images as opposed to line art or simple logo art.

EPS
(Encapsulated PostScript) -- A format used for storing graphics. EPS allows one to embed a file within a PostScript file. EPS files, like other PostScript files, are very good for printing purposes. EPS files, however, tend to be rather large.

PDF
(Portable Document Format) -- The file format used by Adobe's Acrobat. This is a proprietary format that is accessible only by the Acrobat Reader, which is available at no cost on Adobe's WWW site.

TIFF
(Tagged Interchange File Format) -- A format used for storing graphics. A TIFF file essentially contains bitmapped information, but since compression factors are used, the files are smaller than EPS versions.

HTML
Hypertext Mark-up Language. HTML is not a programming language, but a way to format text by placing marks around the text. For example, HTML allows you to make a word bold or underline it. HTML is the foundation of most Web pages.

HTTP
Hypertext Transfer Protocol. A protocol that tells computers how to communicate with each other. Most URLs begin with http://

HTTP Server (Web server)
Application that serves Web content. Users make a request to a Webserver from their browser when they click on a hyperlink in an HTML page. This GET request specifies the filename the user is requesting and is used by the Webserver to fetch a file from the file system and return to the user's browser for rendering and display.


Network Architecture Basics
Application server
An application server runs on both the development and the production servers. The application server works in conjunction with the Web server to serve content to end users. When a Web page is requested by a Web browser, the Webserver will "hand off" the page request to the application server for file with particular file extensions. For example, a Webserver will natively handle all pages with the extension *.html, but hand off all pages with a *.cfm extension to the Allaire ColdFusion application server. When a file is handed off to the application server, the Web server reads the file requested from disk, passes it to the application server for processing, and then waits to hear back from the application server before returning content to the end user.

The application server itself will read the file and look for special commands included in the file. These commands are specific to each vendor's application server solution. These commands are then executed by the application server, and include (a) making query calls to a database to look up a user profile, look up metadata, and then dynamically generate a Web page and (b) executing an entire sequence of transactions to process an online purchase.

Application servers are used because they provide an easier, more scalable means for developers to build complex Web applications than coding CGI scripts. Application servers make it easier to build dynamic Web pages and conduct online commerce by reducing the development cost and transaction cost of querying a database and by providing transactional integrity with session failover, redundancy, and more.

Server
A computer, or a software package, that provides a specific kind of service to client software running on other computers. The term can refer to a particular piece of software, for example a Web server, or to the machine on which the software is running. A single server machine could have several different server software packages running on it, thus providing many different servers to clients on the network.

Client
A software program that is used to contact and obtain data from a server software program on another computer. Each client program is designed to work with one or more specific kinds of server programs, and each server requires a specific kind of client. A Web browser is one type of client.

Development server
The server computer on which Website development is performed. This server sits behind the corporate firewall. All content development, testing, and QA occurs on the development server. Final content is deployed from the development server to the production server.

Production server
The server computer that sits outside the corporate firewall. External audiences are served content by a Webserver running on a production server. Production servers are highly optimized to deliver content quickly to a large number of users. Production servers are tuned by (1) ensuring that extraneous applications and processes do not run on them (2) using specialized servers to serve different types of content (for example, one server to serve all media assets, and another to run all CGI scripts and (3) setting up multiple production servers to serve different geographic regions.

Web Farm
A cluster of production servers in a given geographic location. This cluster of servers is typically coordinated by special load balancing software. Designed to increase the amount of traffic a Website can handle.

Co-location
Most often used to refer to having a server that belongs to one person or group physically located on an Internet-connected network that belongs to another person or group. Usually this is done because the server owner wants his computer to be on a high-speed Internet connection and/or he does not want the security risks of having the server on his own network.

Load Balancing
A specialized function provided by third-party application providers. Load balancing software designed to distribute Web content requests from a single URL accessed by an end user to any number of production servers. For example, a user requesting www.yahoo.com will access a single server running specialized load-balancing software that forwards the request and all subsequent requests to a second server. In this manner, multiple production servers can be used to serve Web content to a larger audience base.

RAID
RAID (Redundant Array of Inexpensive (or Independent) Disks) is a storage mechanism that uses several optical or magnetic disks working in tandem to increase I/O bandwidth and to provide redundancy.


Web Technologies
Cookie
Cookies are used to store state and user preferences information for more interactive Website experiences. Because Webservers do not keep track of content sent to different users, this information is stored and accessed using cookies. Cookies are the piece of information sent by a Web server to a Web browser that the browser software is expected to save and send back to the server whenever the browser makes additional requests from the server.

Cookies might contain information such as login or registration information, online "shopping cart" information, or user preferences. When a server receives a request from a browser that includes a cookie, the server is able to use the information stored in the cookie. For example, the server might customize what is sent back to the user, or keep a log of particular user's requests.

Depending on the type of cookie used, and the browser's settings, the browser may or may not accept the cookie, and may save the cookie for either a short time or a long time. Cookies are usually set to expire after a predetermined amount of time and are usually saved in memory until the browser software is closed down, at which time they may be saved to disk if their "expire time" has not been reached.

Scripts


Scripts are mini-programs that run on both the development and production server. Scripts are written in interpreted languages like PERL and TCL. Interpreted languages are those languages that get generated into machine code that is used by the microprocessor as the code is executed. Because they are generated into machine code on-the-fly, interpreted languages execute more slowly than compiled programs written in C or C++ (programs that are translated into non-readable binary code that is used directly by the microprocessor for program execution).

Scripts are usually used for simple, lightweight applications and are typically much easier to write than a standard program written in C and C++.

CGI
A set of rules that describe how a Web server communicates with another piece of software on the same computer, and how the other piece of software (the "CGI program") talks to the Web server. Any piece of software can be a CGI program if it handles input and output according to the CGI standard.

CGI Scripts
CGI scripts - or Common Gateway Interface - are a set of scripts run on the server side that process user data, insert or retrieve information for either a database or a file, and return results to the Webserver. CGI scripts are most commonly written in PERL, though they can be written in any scripting language. CGI scripts are typically used to process Web forms, taking data entered by the end-user, processing, and dynamically writing HTML code on-the-fly to be returned to the end-user's browser.

cgi-bin
The most common name of a directory on a Web server in which CGI programs are stored. The "bin" part of "cgi-bin" is a shorthand version of "binary", because once upon a time, most programs were refered to as "binaries". In real life, most programs found in cgi-bin directories are text files - scripts that are executed by binaries located elsewhere on the same machine

PERL
An intrepreted language used for the development of CGI scripts. PERL provides easy means for Web developers to process text strings provided by the Webserver according to the CGI standard. The vast majority of scripted programs on Websites running on the UNIX operating system are written in PERL.

Server-side includes (SSI)
Content that sits outside an HTML file that is dynamically included by the Webserver upon page request. For example, a user requests a page, index.shtml. The Webserver is configured to handle files with the extension *.shtml differently than files with a normal *.html extension. For files with a normal *.html extension, the Webserver retrieves the file from the file-system and returns to the user's browser for rendering and display. For files with a *.shtml extension, the Web server fetches the file from the filesystem, and, before returning to the end user, reads the file and looks for special directives of the form
  
For each directive found, the Webserver will either (a) take the a file from a specified file system location, open it, and insert it into the Web page before sending it to the end user or (b) execute a script on the server and take the results of the script and insert them into the Web page before sending it to the end user. Server-side includes are used to make content modular so that (a) one content component can be included on multiple Web pages and (b) content components can be changed in one place and automatically be reflected in all Web pages.

SQL
(Structured Query Language) -- A specialized programming language for sending queries to databases. Most industrial-strength and many smaller database applications can be addressed using SQL. Each specific application will have its own version of SQL implementing features unique to that application, but all SQL-capable databases support a common subset of SQL.


ICE
ICE (Information Content and Exchange) is a protocol aiming to develop a consistent vocabulary for describing and managing the exchange of content and electronic assets.

Java
Java is a simple, object-oriented, distributed, interpreted, robust, secure, architecture neutral, portable, high-performance, multithreaded, and dynamic language and software platform.

Java is a network-oriented programming language invented by Sun Microsystems that is specifically designed for writing programs that can be safely downloaded to your computer through the Internet and immediately run without fear of viruses or other harm to your computer or files. Using small Java programs (called "applets"), Web pages can include functions such as animations, specialized calculators, and other programs.

JavaBeans
JavaBeans are the component architecture for the Java platform, which is a lightweight component architecture that enables a developer to assemble an application that is written only once, runs anywhere, and consists of components that can be reused everywhere.

ActiveX
ActiveX is a combination of software components and technologies in the Microsoft Windows environment.

COM/COM+
COM is a binary interoperability specification and communication convention for software components in the Microsoft Windows environment. COM+ is the next generation of COM, which provides functions to automatically translate C/C++ language functions into supporting software components framework.

DCOM
DCOM (Distributed COM) is a model that enables COM objects to invoke and to communicate with remote objects.

CORBA
CORBA (Common Object Request Broker Architecture) is an architecture that allows integration of a wide variety of object systems, which is published by OMG (Object Management Group).

ORB
ORB (Object Request Broker) is a component in CORBA. ORB is responsible for all of the mechanisms required to find the object implementation for the request, to prepare the object implementation to receive the request, and to communicate the data making up the request.

XML
XML (Extensible Markup Language) is a standard for creating markup languages. Using XML, new tags or new metadata structure can be created by one application and interpreted by another application.

XML is a specification put forth by the World Wide Web Consortium (W3C). XML is an offshoot of Standard Generalized Markup Language (SGML) but XML is much easier to use and apply. XML allows Web developers to design their own customized tags to provide functionality that is not available with HTML. The combination of XML and HTML permits powerful and content-rich Web sites. For example, a Web site using XML tags can link to multiple documents using a single HTML hyperlink. Microsoft Internet Explorer 4.01 and Netscape 5.0 support XML. MS2000 uses XML tags to make HTML a standard Office format.

DDE
Dynamic Data Exchange


Products and Technology
Metadata
Metadata is data about data. Metadata is commonly used to identify information that describes a Web asset, most typically an HTML file. Metadata that describes an HTML file might include the name of the author, the language the file is written in, the source of the file, the keywords that describe the file, and the audience the content is targeted for. Metadata is typically included in the HTML code of a given Web page and written in the following form:





Metadata written in an HTML file using META tags can be indexed by search engines (Verity, UltraSeek, Yahoo, AltaVista, etc.). These indices allow a Web developer to create a CGI script that can take a list of keywords a user is searching for, query the index, and return a list of hyperlinks to relevant Web pages. The indices created by the search engine can be thought of as mini-databases highly specialized for simple text string searches.

Metadata is also typically stored in a standard relational database (Oracle, Sybase, Informix, SQL Server). Highly dynamic Websites make use of metadata stored in a relational database to determine what content to display when a page request is made by an end user.

Schema
A conceptual structure of how a digital asset shall be organized.


Security Basics
SSL
(Secure Socket Layer) -- A protocol designed by Netscape Communications to enable encrypted, authenticated communications across the Internet. SSL is used mostly (but not exclusively) in communications between Web browsers and Web servers. URLs that begin with https indicate that an SSL connection will be used.

SSL provides three important things: privacy, authentication, and message integrity.

In an SSL connection each side of the connection must have a security certificate, which each side's software sends to the other. Each side then encrypts what it sends using information from both its own and the other side's certificate, ensuring that only the intended recipient can decry pt it, and that the other side can be sure the data came from the place it claims to have come from, and that the message has not been tampered with.

Security Certificate
A piece of information (often stored as a text file) that is used by the SSL protocol to establish a secure connection.

A security certificate contains information about who it belongs to, who it was issued by, a unique serial number or other unique identification, valid dates, and an encrypted "fingerprint" that can be used to verify the contents of the certificate.

In order for an SSL connection to be created, both sides must have a valid security certificate.

Certificate Authority
An issuer of security certificates used in SSL connections.


Dynamic Content and Personalization
Behavior Tracking
The process of observing a customer's behavior as he clicks through the Website and storing that click-through information in a user profile database. Behavior tracking enables business managers to target content with tailored business rules.

Business Rules
A business rule is typically referenced in conjunction with personalization engines. Business rules determine which users are delivered a specific type of content. Business rules typically use a Broadvision or ATG supplied interface to match content with user groups. A sample business rule: any user working in the high-tech industry should see content related to new Web technologies. Business rules can be stored in a database (Broadvision) or file system (Microsoft, ATG) and are evaluated at run-time by a personalization engine.

Call-outs, call-backs
Call-outs and call-backs are triggers from a server application to an external program and vice versa. Call-outs typically occur during a workflow process where the application logic for a specific task that must execute is contained within a separate executable program. An example of this would be a call-out to an external links checker during a submit process to check for broken links.

Categorization
The process of assigning metadata to content. Categorizing content includes determining whether a content element is related to, for example, Sports or Finance, Hockey or Stocks. Metadata associated with categorized content is used to generate navigational links to relevant content (for example, a list of links to all Hockey articles for a self-described Hockey fan).

Content Targeting
The process of either (a) defining business rules about which customer segments should receive which content or (b) categorizing a particular content element so that it is available to a particular customer audience.

Content Delivery
Serving of Web assets by a Web server to an end user. Content delivery is typically used in conjunction with Web content that is generated on-the-fly by either the Webserver itself (server-side includes), an application server (general database queries), or a personalization engine (specific database queries to both user profile databases and content databases for content matching).

Database
A data storage mechanism managed independently of the operating system by server applications. The applications can either store and retrieve data natively from disk or store and retrieve data from a file system object. Data stored within databases are only accessible from database application interfaces. Databases are designed for rapid, efficient search and queries for structured data.

Database schema
The overall structure of the database tables that store information: user profile data, content metadata, or pure structured information. In the simplest case, a database schema has a single database table of user information. Each record (row) within this table might represent a unique customer, with each field (column) representing relevant customer information (address, city, phone number, etc.). More complex schema would involve multiple database tables related to one another through a common unique identifier. Such relational database tables are necessary for more complex data schemas for performance and easier administration.

Dynamic Content
Content that is updated frequently and is fresh and relevant for its appropriate audience. Dynamic content can include content served as a flat HTML page that is updated many times a day, content that includes sophisticated Javascript or Shockwave for an interactive experience, or content that is generated on-the-fly from either a file-system or a database using server-side includes, CGI scripts, Java servlets, or an application server.

File System
A data storage mechanism natively managed by the server operating system. File systems allow operating systems to store and retrieve data from disk. Data is stored on disk logically categorized using into directories following a file cabinet metaphor. File systems are designed for rapid, efficient, scalable disk I/O for most common forms of saved data.

Index Engine
A server application that "walks" a file system, reads every text file, and builds a mini-database of content elements (most typically content metatags). Examples of index engines include Web crawlers used by Excite, Lycos, and Altavista to index the Web. Other examples include index engines used by ATG, Microsoft, NetPerceptions, Verity, and Ultraseek for both search and content delivery.

Personalization
The process of matching categorized content with different end users based on business rules. This personalization process occurs upon page request to a Webserver and is handled by either (a) a general application server (b) a specialized one-to-one application server or (c) a specific personalization engine.

Pre-event trigger
A call-out to an external program prior to an action completing within the server application. The pre-event trigger supplies the external program the information it needs to properly execute the external program and return completion status information back to the calling server process.

Query
A call to a database to retrieve a set of information. Typically made via a Structured Query Language (SQL) call, an industry standard for relational database queries. In the Web world, database queries are specified in an HTML page and executed by an application server which format the information retrieved from the database into HTML to return to the end user. A sample query would be to select from a database customer table all customers with city address equaling "San Francisco".

Replication
Deployment of a body of content from one server machine to another. Replication typically involves specific rules about what gets deployed where, and often represents a complete synchronization of two independent bodies of content.

Rich HTML
HTML pages that are complex in data, design, and presentation. Rich HTML pages contain any number of image or media assets, have dynamic content sections, have complex layout that is unique to that page, and have random collections of fresh, relevant content that is not categorized and stored in a relational database.

Time-outs
Time-outs are conditional tasks that execute when a certain task has not been completed for a given period of time. Time-outs are typically used in conjunction with workflow approvals. If a user has not approved a file within a set amount of time, the approval task will time out and execute a conditional workflow task to proceed forward with the production process - either emailing the user as a reminder, escalating the approval to a senior manager, or simply proceeding with the deployment of the unapproved content.

User Profile
A mini-database of information about a user's content preferences, market segmentation, and past visit and purchase behavior. One example is information entered into MyYahoo! to customize preferences stored in a user's profile database. Used for determine what navigational links and content to store for a particular site visitor.


Other
Auditing
Auditing involves records the sequence of activities that occur on any given file or body of content within a content management system. This sequence of events - an audit trail - can be used by content contributors, managers, and auditors alike to determine how and why content was changed.

Task
A task is a unit of work within a workflow. Workflows are composed of multiple tasks with can be executed serially, in parallel, or on a conditional basis. Examples of tasks include creation/editing of a variety of Web assets, approval of a set of modified content, automatic link checking of edited HTML content, automatic email reminders of past due dates, and timed deployment to a bank of production servers.

Workflow
Workflow is a set of interdependent tasks that occur in a specific sequence. Examples of workflow automated routing of content for approval or automated integration of Web content for publication and deployment to a bank of production servers.

Acrobat
Adobe's newest page description language and related applications software. Acrobat allows the text portions of its files to be searched, and it adds a variety of other tools that are useful in information handling environments. The file format Acrobat uses is called PDF (portable document format). Because Acrobat/PDF is a proprietary format, one must have an Acrobat Reader (a run version of Acrobat) and Acrobat utilities in order to use any such files.

DTD
(Document Type Definition) -- The document map for tailored SGML applications. The DTD defines the structure and elements of a particular document style. For example, a book would have one DTD while an article would have a different DTD, since the sections and content of these two types of publications differ. It is a flexible document map that one can tailor for one's own applications.

Parsing
The process of checking an SGML formatted document to ensure it has met all the rules of both SGML and the DTD that is being used. Technically, a document is not considered to be SGML until it has been successfully parsed, as defined by the ISO Standard for SGML.

PostScript
Adobe's original page description language. PostScript files contain images of full pages (mixed text and graphics). The black-and-white image pages are scaleable, but not searchable. PostScript files are more compressed than bitmaps, so they take up much less storage space. PostScript files are excellent for printing purposes -- high resolutions are retained.

SGML
(Standard Generalized Markup Language) -- A programming language used to define content and structure in documents. SGML has its own protocol and syntax as all programming languages do.

SGML Formatted Document"
A document that has had SGML-defined tags and SGML protocol applied. For example, the title, authors, abstract, headings, references, figure captions, etc. have been tagged using the protocol defined by SGML.

WWW
(World Wide Web) a special browser-based network on the Internet. Computers on the Internet that are running browser-based services are called "Web Servers.") The subset of Web Servers on the larger network of Internet-linked computers constitutes the WWW.

WYSIWYG
What You See Is What You Get.