A data definition language is a computer programing or scripting language that defines a data structure. By definition, these languages need to do three things—create, delete and modify data structures. What they do outside those three areas is immaterial, as long as they perform those three tasks. Any language may be a data definition language, but the term is most commonly applied to structured query language (SQL) and extensible markup language (XML) schemas.
When the term data definition language came into use, it applied to a model made by the Conference for Data System Languages (Codasyl). The process defined two major areas of data structure development—the data definition language made the actual structure of the database, and the data manipulation language defined the methods of placing data in the structure. Since then, these two terms have expanded and become generic terms for the processes they cover.
The generic terms now apply to any languages that perform their original functions. Both SQL and XML perform all of the tasks required and provide many features that were absent from the original model because they had not been invented yet. Other languages provide these capabilities as well; they are just used much less often.
In order to be a data definition language, the language needs to provide three functions. The first main function is the construction of data structures; basically, these are tables designed to hold onto specific groups of information. They often appear like a spreadsheet, holding onto pages of cross-referenced information. For instance, the sheet may have a list of a business’s customers down one side and list of available products along the top. The sheet would have a listing of when those customers purchased individual products down on the table portion.
The next main function is the deletion of data structures. This is not the same as the deletion of an entire database or file; it is a much more selective process. It may remove a specific page of information or an entire portion of a multidimensional array. In either case, the data must be removed without affecting other data structures, even if they are all held in the same file.
The last main function is the alteration of a data structure. This is a broad category that covers a lot of situations. A table may have columns added or renamed, or an entire database may need to be split into two different databases. In any situation, they must be altered in a way that no information is lost, destroyed or created during the process. This prevents anomalous information from entering the data system.