Data Structures(Subject Code: BTCOC303) Unit 1 Introduction Notes

Question 1: What is Data?

Answer: Data can be defined as a representation of facts, concepts, or instructions in a formalized manner, which should be suitable for communication, interpretation, or processing by human or electronic machine. Data may or may not have any meaning.

• Data is the raw, unorganized facts that need to be processed.

• Data is the name given to basic facts and entities such as names and numbers.

• When data are processed, organized, structured or presented in a given format to make them useful, they are called Information.

• Information provides context for data. For example, a list of dates — data — is meaningless without the information that makes the dates relevant (dates of holiday). Data is represented with the help of characters such as alphabets (A-Z, a-z), digits (0-9) or special characters (+,-,/,*,<,>.)

=========================================================================

Question 2: What is information?

Answer: Information is organized or classified data, which has some meaningful values for the receiver. Information is the processed data on which decisions and actions are based. When data are processed, organized, structured or presented in a given format to make them useful, they are called Information. Information provides context for data. For example, a list of dates — data — is meaningless without the information that makes the dates relevant (dates of holiday).

=========================================================================

Question 3: What are Data types?

Answer: Data types specify how we enter data into our program variables and what type of data we enter into it. C language has some predefined set of data types to handle various kinds of data that we can use in our program. These data types have different storage capacities.

C language supports 2 different type of data types:

1. Primary data types: These are fundamental data types in C namely integer (int), floating point (float), character (char) and void.

2. Derived data types: Derived data types are nothing but primary data types but a little twisted or grouped together like array, structure, union and pointer. These are discussed in details later.

Data type determines the type of data a variable will hold. If a variable x is declared as int. it means x can hold only integer values. Every variable which is used in the program must be declared as what data-type it is.

=========================================================================

Question 4: What is Data structure?

Answer: In computer science, a data structure is a data organization, management, and storage format that enables efficient access and modification. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data. Data Structures is about for better organization and storage. We can store student data like their rollno, name and address.

· We can organize this data as a record like student record.

· Data Structures are used to store ordered data, so that various operations can be performed on it easily.

· Anything that can store data can be called as a data structure, hence Integer, Float, Boolean, Char etc, all are data structures. They are known as Primitive Data Structures.

Types of Data Structures

=========================================================================

Question 5: What is Abstract Data Type (ADT)?

Answer: Concept of “Abstraction” allows to consider the high-level characteristics of something without getting down in the details. For example: Process abstraction in procedural programming like C, we can use (pre-defined) functions without concern how they really works inside. Some example of Abstract Data Structure (ADT) are: Linked List, Tree, Graph, Stack, Queue etc.

Data Abstraction means we know what a data type can do and how it is done is hidden. Abstract Data Type (ADT) defines a particular data structure in terms of data and operations. It offers an interface of the objects (instances of an ADT).

An ADT consists of

- Declaration of data

- Declaration of operations

- Encapsulation of data and operations: data is hidden from user and can be manipulated only by means of operations.

=========================================================================

Question 6. How information or data is stored? Explain representation of Information.

Answer: Two basic ways to store and access information is via print (like a newspaper, book) or electronically (reading that same newspaper on your computer). With a wide range of businesses increasingly keeping office, client, customer and patient information electronically raises the question: how data actually stored?

All electronic data is reduced to seemingly endless lines of 1s and 0s called binary code. Each individual 1 or 0 is referred to as a ‘bit.’ The smallest number of bits that can be read electronically is eight (i.e., 01001010). This is referred to as a ‘byte’ and each one holds only a tiny amount of information. About a thousand kilobytes is called a megabyte and about a thousand megabytes is referred to as a gigabyte.

1 kilobyte = 1,024 bytes
1 megabyte = 1,048,576 bytes
1 gigabyte = 1,073,741,824 bytes

This data can be stored in three basic ways:
1) Magnetic storage
2) Optical discs
3) Solid State Drive (SSD) or Flash storage

Magnetic storage is commonly used on the hard disc drives found on most computers. Information is stored using positive and negative magnetic charges to correspond with the 1s and 0s noted above. Optical discs like CDs and DVDs store information as binary code that can be read by an optical sensor in a disc drive. Flash/SSD technology – commonly found in smartphones, USB drives and some laptops – stores that same information electronically. Each system has different characteristics, but they can all communicate with each other since they all store the underlying information they hold as binary code. Thus, electronic information can be transferred easily between computers, smartphones, tablets, USB drives and other electronic devices.

====================================================================

Question 7. What is Algorithm? Explain characteristics of algorithm.

Answer: An algorithm is a finite set of instructions or logic, written in order, to accomplish a certain predefined task. Algorithm is not the complete code or program, it is just the core logic (solution) of a problem, which can be expressed either as an informal high level description as pseudocode or using a flowchart.

An algorithm should have the following characteristics −

Unambiguous − Algorithm should be clear and unambiguous. Each of its steps (or phases), and their inputs/outputs should be clear and must lead to only one meaning.

Input − an algorithm should have 0 or more well-defined inputs.

Output − an algorithm should have 1 or more well-defined outputs, and should match the desired output.

Finiteness − Algorithms must terminate after a finite number of steps.

Feasibility − should be feasible with the available resources.

Independent − an algorithm should have step-by-step directions, which should be independent of any programming code.

Well Defined Instructions - An algorithm is a set of well-defined instructions in sequence to solve a problem.

Step by step approach - Algorithm is a step-by-step procedure, which defines a set of instructions to be executed in a certain order to get the desired output.

Language independent - Algorithms are generally created independent of underlying languages, i.e. an algorithm can be implemented in more than one programming language.

Example Problem − Design an algorithm to add two numbers and display the result.

Step 1 − START

Step 2 − declare three integers a, b & c

Step 3 − define values of a & b

Step 4 − add values of a & b

Step 5 − store output of step 4 to c

Step 6 − print c

Step 7 − STOP

====================================================================

Question 8. What is Computer Program?

Answer: A computer program is a sequence of instructions to solve given problem. The computer program is called high level program which is not understandable by computer. The high level program need to be converted into low level or machine understandable code in the form of 0’s and 1’s. The computer program is translated into low level machine code using compiler or interpreter. A collection of computer programs, libraries, and related data is called software.

=========================================================================

Question 9. Explain the concept of analyzing programs.

Answer: In computer science, program analysis is the process of automatically analyzing the behavior of computer programs regarding a property such as correctness, robustness, safety and liveness. Program analysis focuses on two major areas: program optimization and program correctness.

The first focuses on improving the program’s performance while reducing the resource usage while the latter focuses on ensuring that the program does what it is supposed to do. Program analysis participates to Software Intelligence by allowing to provide information used in the mastering and understanding of software systems.

Program analysis can be performed without executing the program (static program analysis), during runtime (dynamic program analysis) or in a combination of both.

Static program analysis

In the context of program correctness, static analysis can discover vulnerabilities during the development phase of the program. These vulnerabilities are easier to correct than the ones found during the testing phase since static analysis leads to the root of the vulnerability.

Due to many forms of static analysis being computationally undecidable, the mechanisms for doing it will not always terminate with the right answer – either because they sometimes return a false negative (“no problems found” when the code does in fact have problems) or a false positive, or because they never return the wrong answer but sometimes never terminate. Despite their limitations, the first type of mechanism might reduce the number of vulnerabilities, while the second can sometimes give strong assurance of the lack of a certain class of vulnerabilities.

Incorrect optimizations are highly undesirable. So, in the context of program optimization, there are two main strategies to handle computationally undecidable analysis:

1. An optimizer that is expected to complete in a relatively short amount of time, such as the optimizer in an optimizing compiler, may use a truncated version of an analysis that is guaranteed to complete in a finite amount of time, and guaranteed to only find correct optimizations.

2. A third-party optimization tool may be implemented in such a way as to never produce an incorrect optimization, but also so that it can, in some situations, continue running indefinitely until it finds one (which may never happen). In this case, the developer using the tool would have to stop the tool and avoid running the tool on that piece of code again (or possibly modify the code to avoid tripping up the tool).

However, there is also a third strategy that is sometimes applicable for languages that are not completely specified, such as C. An optimizing compiler is at liberty to generate code that does anything at runtime – even crashes – if it encounters source code whose semantics are unspecified by the language standard in use.

Control-flow

The purpose of control-flow analysis is to obtain information about which functions can be called at various points during the execution of a program. The collected information is represented by a control flow graph (CFG) where the nodes are instructions of the program and the edges represent the flow of control. By identifying code blocks and loops CFG becomes a starting point for compiler made optimizations.

Data-flow analysis

Data-flow analysis is a technique designed to gather information about the values at each point of the program and how they change over time. This technique is often used by compilers to optimize the code. One of the most known examples of data-flow analysis is taint checking which consists of considering all variables which contain user supplied data – which is considered “tainted”, i.e. insecure – and preventing those variables from being used until they have been sanitized. This technique is often used to prevent SQL injection attacks. Taint checking can be done statically or dynamically.

Abstract interpretation

Abstract interpretation allows to extract information about a possible execution of a program without actually executing the program. This information can be used by compilers to look for possible optimizations or for certifying a program against certain classes of bugs.

Type systems

Type systems associate types to programs that fulfil certain requirements. Their purpose is to select a subset of programs of a language that are considered correct according to a property. Type checking – verify whether the program is accepted by the type system.

Type checking is used in programming to limit how a programming object is used and what can they do. This is done by the compiler or interpreter. Type checking can also help preventing vulnerabilities by ensuring that a signed value isn’t attributed to an unsigned variable. Type checking can be done statically (at compile time), dynamically (at runtime) or a combination of both. Static type information (either inferred, or explicitly provided by type annotations in the source code) can also be used to do optimizations, such as replacing boxed arrays with unboxed arrays.

Effect systems

Effect systems are formal systems designed to represent the effects executing a function or method can have. An effect codifies what is being done and with what it is being done – usually referred to as effect kind and region, respectively.

Model checking

Model checking refers to strict, formal and automated ways to check if a model (which in this context means a formal model of a piece of code, though in other contexts it can be a model of a piece of hardware) complies with a given specification. Due to the inherent finite state nature of code, and both the specification and the code being convertible into logical formulae, it is possible to check if the system violates the specification using efficient algorithmic methods.

Dynamic program analysis

Dynamic analysis can use runtime knowledge of the program to increase the precision of the analysis while also providing runtime protection but can only analyze a single execution of the problem and might degrade the program’s performance due to runtime checks.

Testing

Software should be tested to ensure its quality and that it performs as it is supposed to in a reliable manner, and that won’t create conflicts with other software that may function alongside it. The tests are performed by executing the program with an input and evaluating its behavior and the produced output. Even if no security requirements are specified, additional security testing should be performed to ensure that an attacker can’t tamper with the software and steal information, disrupt the software’s normal operations or use it as a pivot to attack its users.

Monitoring

Program monitoring records and logs different kinds of information about the program such as resource usage, events and interaction so it can be reviewed to find abnormal behavior or even pinpoint what caused the abnormal behavior. Furthermore it can be used to perform security audits. Automated monitoring of programs is sometimes referred to as runtime verification.

Program slicing

For a given subset of a program’s behavior, program slicing consists of reducing the program to the minimum form that still produces the selected behavior. The reduced program is called a “slice” and is a faithful representation of the original program within the domain of the specified behavior subset. Generally, finding a slice is an unsolvable problem, but by specifying the target behavior subset by the values of a set of variables, it is possible to obtain approximate slices using a data-flow algorithm. These slices are usually used by developers during debugging to locate the source of errors.

Engineering Tutorial

Search This Blog