CSE 581 INTRO TO DATABASE MANAGEMENT SYSTEMS Review Catalog description • DBMS building blocks; entity-relationship and relational models; SQL/Oracle; integrity constraints; database design; file structures; indexing; query processing; transactions and recovery; overview of object relational DBMS, data warehouses, data mining. • Main objective: • Attain a strong foundation to understand and analyze database management systems and to apply these insights and principles to future databases. It’s all about … • Data, and its management • Great need/demand: • Microsoft, Google, Oracle, IBM, Amazon, … • Startups • Science: • Large data • Analytics • Discuss fundamentals of data management • How to query databases, • design databases, • build applications with them. Course outline • From a user’s perspective • Intro to DBs, SQL, Management Studio • Select, Table Joins, Unions • Summary Queries, Subqueries • Data manipulation, Datatypes • Subqueries Functions • Understanding how it works • Database Design • Database Implementation • Views, Scripts • Stored Procedures, Functions • Transactions, Locking • Server Security • XML Weekly schedule A little history Large Systems, 2020 Terabytes of web data Petabytes of stored data 1954 1971 2000 US Census, 1880 50 Million people Age, gender, occupation, education Hollerith, 1890 80 variables Punch cards 6 weeks analysis 2025 Big data! • Increasingly many companies see themselves as data driven. Infrastructure is changing… • New technology, same methodology www.bigdatalandscape.com What is a Database Management System (DBMS)? • A large, integrated collection of data • A piece of software designed to store and manage databases • Models a real-world enterprise • Entities (e.g., Students, Courses, Professors) • Relationships (e.g., Mike is enrolled in 581) • Who takes what? • Who teaches what? Database system environment Applications Programs DBMS System admin Database admin Database designer Programmers & Analysts End users DBMS utilities Procedures and standards write and enforce manage design accesswrite use Hardware Student Data Faculty Data Room Data Student Schedule Faculty Schedule Room Schedule Data Data independence • Applications do not need to worry about how the data is structured and stored • Logical data independence: protection from changes in the logical structure of the data • i.e. should not need to ask: can we add a new entity or attribute without rewriting the application? • Physical data independence: protection from physical layout changes • I.e. should not need to ask: which disks are the data stored on? Is the data indexed? Transactions • Security: Different users, different roles • Performance: Need to provide concurrent access • Consistency: Concurrency can lead to update problems DBMS allows user to write programs as if they were the only user CRUD ACID Create Replace Update Delete Consistent Isolated DurableAtomic Unit 1 – Databases 1. Identify the three main hardware components of a client/server system. 2. Describe the way a client accesses the database on a server using these terms: application software, data access API, database management system, SQL query, and query results. 3. Describe the way a database is organized using these terms: tables, columns, rows, and cells. 4. Describe how the tables in a relational database are related using these terms: primary key and foreign key. 5. Identify the three types of relationships that can exist between two tables. 6. Describe the way the columns in a table are defined using these terms: data type, null value, default value, and identity column. Objectives • Describe the relationship between standard SQL and Microsoft SQL Server’s Transact-SQL. • Describe the difference between DML statements and DDL statements. • Describe the difference between an action query and a SELECT query. • List three coding techniques that can make your SQL code easier to read and maintain. • Explain how views and stored procedures differ from SQL statements that are issued from an application program. • Describe the use of command, connection, and data reader objects when .NET applications access a SQL Server database. A comparison of relational databases and other database systems Feature Hierarchical database Network database Relational database Supported relationships One-to-many only One-to-many, one-to-one, and many-to-many One-to-many, one-to-one, and many-to-many; ad hoc relationships can also be used Data access Programs must include code to navigate through the physical structure of the database Programs must include code to navigate through the physical structure of the database Programs can access data without knowing its physical structure A comparison of relational databases and other database systems (continued) Feature Hierarchical database Network database Relational database Maintenance New and modified relationships can be difficult to implement in application programs New and modified relationships can be difficult to implement in application programs Programs can be used without modification when the definition of a table changes Important events in the history of SQL • 1970 Dr. E. F. Codd developed the relational database model. • 1978 IBM developed the predecessor to SQL, called Structured English Query Language (SEQUEL). • 1979 Relational Software, Inc. (later renamed Oracle) released the first relational DBMS, Oracle. • 1982 IBM released their first relational database system, SQL/DS (SQL/Data System). • 1985 IBM released DB2 (Database 2). • 1987 Microsoft released SQL Server. • 1989 ANSI published the first set of standards (ANSI/ISO SQL- 89, or SQL1). • 1992 ANSI revised standards (ANSI/ISO SQL-92, or SQL2). • 1999 ANSI published SQL3 (ANSI/ISO SQL:1999). Common options for accessing SQL Server data A Visual Basic function that uses ADO.NET to retrieve data from SQL Server Public Shared Function GetVendor( vendorID As Integer) As Vendor Dim vendor As New Vendor ' Create the connection object Dim connection As New SqlConnection() connection.ConnectionString = "Data Source=localhost\SqlExpress;" & "Initial Catalog=AP;Integrated Security=True" ' Create the command object and set the connection, ' SELECT statement, and parameter value Dim selectCommand As New SqlCommand selectCommand.Connection = connection selectCommand.CommandText = "SELECT VendorID, " & "VendorName, VendorAddress1, VendorAddress2, " & "VendorCity, VendorState, VendorZipCode " & "FROM Vendors WHERE VendorID = @VendorID" selectCommand.Parameters.AddWithValue( "@VendorID", vendorID) A Visual Basic function that uses ADO.NET to retrieve data from SQL Server (continued) ' Open the connection to the database connection.Open() ' Retrieve the row specified by the SELECT statement ' and load it into the Vendor object Dim reader As SqlDataReader = selectCommand.ExecuteReader If reader.Read Then vendor.VendorID = CInt(reader("VendorID")) vendor.VendorName = reader("VendorName").ToString vendor.VendorAddress1 = reader("VendorAddress1").ToString vendor.VendorAddress2 = reader("VendorAddress2").ToString vendor.VendorCity = reader("VendorCity").ToString vendor.VendorState = reader("VendorState").ToString vendor.VendorZipCode = reader("VendorZipCode").ToString A Visual Basic function that uses ADO.NET to retrieve data from SQL Server (continued) Else vendor = Nothing End If reader.Close() ' Close the connection to the database connection.Close() Return vendor End Function A C# method that uses ADO.NET to retrieve data from SQL Server public static Vendor GetVendor(int vendorID) { Vendor vendor = new Vendor(); // Create the connection object SqlConnection connection = new SqlConnection(); connection.ConnectionString = "Data Source=localhost\\SqlExpress;" + "Initial Catalog=AP;Integrated Security=True"; // Create the command object and set the connection, // SELECT statement, and parameter value SqlCommand selectCommand = new SqlCommand(); selectCommand.Connection = connection; selectCommand.CommandText = "SELECT VendorID, " + "VendorName, VendorAddress1, VendorAddress2, " + "VendorCity, VendorState, VendorZipCode " + "FROM Vendors WHERE VendorID = @VendorID"; selectCommand.Parameters.AddWithValue( "@VendorID", vendorID); A C# method that uses ADO.NET to retrieve data from SQL Server (continued) // Open the connection to the database connection.Open(); // Retrieve the row specified by the SELECT statement // and load it into the Vendor object SqlDataReader reader = selectCommand.ExecuteReader(); if (reader.Read()) { vendor.VendorID = (int)reader["VendorID"]; vendor.VendorName = reader["VendorName"].ToString(); vendor.VendorAddress1 = reader["VendorAddress1"].ToString(); vendor.VendorAddress2 = reader["VendorAddress2"].ToString(); vendor.VendorCity = reader["VendorCity"].ToString(); vendor.VendorState = reader["VendorState"].ToString(); A C# method that uses ADO.NET to retrieve data from SQL Server (continued) vendor.VendorZipCode = reader["VendorZipCode"].ToString(); } else { vendor = null; } reader.Close(); // Close the connection to the database connection.Close(); return vendor; } Unit 1b – Management Studio •Use the Management Studio to do any of the following management tasks: 1. Start or stop an instance of SQL Server Express 2. Enable a remote connection to a database 3. Navigate through the objects of a database 4. Attach, detach, back up, or restore a database 5. Set the compatibility level for a database 6. View the database diagrams for a database 7. View the column definitions for a table in a database 8. View or modify the data of a table in a database 9. Modify the column definitions for a table in a database •Use the Management Studio to run, open, and save SQL queries. •Use the Query Designer to build and run queries. •Use SQL Server documentation to look up information about SQL Server. Unit 1b - Knowledge • Briefly describe the function of each of these client tools: the Management Studio, the Query Designer, and SQL Server documentation. • Distinguish between Windows authentication and SQL Server authentication for connecting to a SQL Server database. • Describe what “compatibility level” means when it is applied to SQL Server. Unit 2 - Data Retrieval From a Single Table Applied • Code and run SELECT statements that use any of the language elements. Knowledge • Distinguish between the base table values and the calculated values in SELECT statements. • Describe the use of a column alias. • Describe the order of precedence and the use of parentheses for arithmetic expressions. • Describe the use of the DISTINCT keyword and the TOP clause. • Describe the use of comparison operators, logical operators, and parentheses in WHERE clauses. Unit 2 - Data Retrieval From a Single Table Knowledge • Describe the use of the IN, BETWEEN, and LIKE operators in WHERE clauses. • Describe the use of the IS NULL clause in a WHERE clause. • Describe the use of column names, aliases, calculated values, and column numbers in ORDER BY clauses. • Describe the use of the OFFSET and FETCH clauses in ORDER BY clauses. An ORDER BY clause that retrieves rows 11 through 20 SELECT VendorName, VendorCity, VendorState, VendorZipCodeFROM Vendors WHERE VendorState = 'CA' ORDER BY VendorCityOFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY; Unit 3 - Summary queries and subqueries 1. HAVING and WHERE clauses 2. WITH ROLLUP and WITH CUBE operators 3. GROUPING SETS operator 4. OVER clause 5. Subqueries 6. Common table expressions (CTEs) to define the subqueries 7. Correlated and noncorrelated subqueries A summary query that calculates the average invoice amount by vendor SELECT VendorID, AVG(InvoiceTotal) AS AverageInvoiceAmount FROM Invoices GROUP BY VendorID HAVING AVG(InvoiceTotal) > 2000 ORDER BY AverageInvoiceAmount DESC; Execution order: 1.FROM 2.ON 3.JOIN 4.WHERE 5.GROUP BY 6.CUBE | ROLLUP 7.HAVING 8.SELECT 9.DISTINCT 10.ORDER BY 11.TOP 12.LIMIT | OFFSET Why not the above alias? A summary query with a search condition in the WHERE clause SELECT VendorName, COUNT(*) AS InvoiceQty, AVG(InvoiceTotal) AS InvoiceAvg FROM Vendors JOIN Invoices ON Vendors.VendorID = Invoices.VendorID WHERE InvoiceTotal > 500 GROUP BY VendorName ORDER BY InvoiceQty DESC; (20 rows) A summary query with a summary row for each set of groups SELECT VendorState, VendorCity, COUNT(*) AS QtyVendors FROM Vendors WHERE VendorState IN ('IA', 'NJ') GROUP BY VendorState, VendorCity WITH CUBE ORDER BY VendorState DESC, VendorCity DESC; GROUP BY CUBE(VendorState, VendorCity) With rollup With single column, rollup same as cube Four ways to introduce a subquery in a SELECT statement 1. In a WHERE clause as a search condition 2. In a HAVING clause as a search condition 3. In the FROM clause as a table specification 4. In the SELECT clause as a column specification A subquery in the WHERE clause SELECT InvoiceNumber, InvoiceDate, InvoiceTotal FROM Invoices WHERE InvoiceTotal > (SELECT AVG(InvoiceTotal) FROM Invoices) ORDER BY InvoiceTotal; The value returned by the subquery: 1879.7413 (21 rows) Where a subquery can be introduced If a subquery returns… It can be introduced… A single value Anywhere an expression is allowed A result set with a single column In place of a list of values A result set with one or more columns In place of a table in the FROM clause A query that uses an inner join SELECT InvoiceNumber, InvoiceDate, InvoiceTotal FROM Invoices JOIN Vendors ON Invoices.VendorID = Vendors.VendorID WHERE VendorState = 'CA' ORDER BY InvoiceDate; (40 rows) The same query restated with a subquery SELECT InvoiceNumber, InvoiceDate, InvoiceTotal FROM Invoices WHERE VendorID IN (SELECT VendorID FROM Vendors WHERE VendorState = 'CA’) ORDER BY InvoiceDate; The same result set (40 rows) Advantages of joins and subqueries Advantages of joins • The result of a join operation can include columns from both tables. • A join tends to be more intuitive when it uses an existing relationship between two tables. • A query with a join typically performs faster than the same query with a subquery. Advantages of subqueries • A subquery can pass an aggregate value to the outer query. • A subquery tends to be more intuitive when it uses an ad hoc relationship between two tables. • Long, complex queries can sometimes be easier to code using subqueries. The syntax of a WHERE clause that uses an IN phrase with a subquery WHERE test_expression [NOT] IN (subquery) A query that returns vendors without invoices SELECT VendorID, VendorName, VendorState FROM Vendors WHERE VendorID NOT IN (SELECT DISTINCT VendorID FROM Invoices); A procedure for building complex queries 1. State the problem to be solved by the query in English. 2. Use pseudocode to outline the query. 3. If necessary, use pseudocode to outline each subquery. 4. Code the subqueries and test them to be sure that they return the correct data. 5. Code and test the final query. Unit 4 - Insert, update and delete data Application • Given the specifications for an action query, code the INSERT, UPDATE, or DELETE statement for doing the action. • Use the MERGE statement to merge rows from a source table into a target table. • Create a copy of a table by using the INTO clause of the SELECT statement. Knowledge • Describe the three types of action queries. • Explain how to handle null values and default values when coding INSERT and UPDATE statements. • Explain how the FROM clause is used in an UPDATE or DELETE statement. • Explain how the MERGE statement works. Insert three rows INSERT INTO InvoiceCopy VALUES (95, '111-10098', ‘2021-04-01', 219.50, 0, 0, 1, ‘2021-04-30', NULL), (102, '109596', ‘2021-04-01', 22.97, 0, 0, 1, ‘2021-04-30’, NULL), (72, '40319', ‘2021-04-01', 173.38, 0, 0, 1, ‘2021-04-30’, NULL); The response from the system (3 row(s) affected) Update one column of multiple rows UPDATE InvoiceCopy SET TermsID = 1 WHERE VendorID = 95; Update a column using an arithmetic expression UPDATE InvoiceCopy SET CreditTotal = CreditTotal + 100 WHERE InvoiceNumber = '97/522'; (6 row(s) affected) (1 row(s) affected) Warning • If you omit the WHERE clause, all the rows in the table will be updated. A subquery used in a search condition UPDATE InvoiceCopy SET TermsID = 1 WHERE VendorID = (SELECT VendorID FROM VendorCopy WHERE VendorName = 'Pacific Bell'); Another subquery used in a search condition UPDATE InvoiceCopy SET TermsID = 1 WHERE VendorID IN (SELECT VendorID FROM VendorCopy WHERE VendorState IN ('CA', 'AZ', 'NV')); (6 row(s) affected) (51 row(s) affected) A subquery used in a search condition DELETE VendorCopy WHERE VendorID NOT IN (SELECT DISTINCT VendorID FROM InvoiceCopy); A derived table joined with another table DELETE VendorCopy FROM VendorCopy JOIN (SELECT VendorID, SUM(InvoiceTotal) AS TotalOfInvoices FROM InvoiceCopy GROUP BY VendorID) AS InvoiceSum ON VendorCopy.VendorID = InvoiceSum.VendorID WHERE TotalOfInvoices <= 100; (88 row(s) affected) (6 row(s) affected) Insert and update rows MERGE InvoiceArchive AS ia USING InvoiceCopy AS ic ON ic.InvoiceID = ia.InvoiceID WHEN MATCHED AND ic.PaymentDate IS NOT NULL AND ic.PaymentTotal > ia.PaymentTotal THEN UPDATE SET ia.PaymentTotal = ic.PaymentTotal ia.CreditTotal = ic.CreditTotal, ia.PaymentDate = ic.PaymentDate WHEN NOT MATCHED THEN INSERT (InvoiceID, VendorID, InvoiceNumber, InvoiceTotal, PaymentTotal, CreditTotal, TermsID, InvoiceDate, InvoiceDueDate) VALUES (ic.InvoiceID, ic.VendorID, ic.InvoiceNumber, ic.InvoiceTotal, ic.PaymentTotal, ic.CreditTotal, ic.TermsID, ic.InvoiceDate, ic.InvoiceDueDate) ; Unit 4b - work data with data types 1. Describe the data that can be stored in any of the string, numeric, date/time, and large value data types. 2. Describe the difference between standard character data and Unicode character data. 3. Describe the differences between implicit and explicit data type conversion. Data types Type Bytes biginit 8 int 4 smallint 2 tinyint 1 bit 1 The integer data types Type Bytes decimal[(p[,s])] 5-17 numeric[(p[,s])] 5-17 money 8 smallmoney 4 decimal[(p[,s])] 5-17 The decimal data types Type Bytes float[(n)] 4 or 8 real 4 The real data types Precision: # of digits in a number. Scale: # of digits to the right of the decimal point in a number. The number 543.21 has a precision of 5 and a scale of 2 Date/time data types and formats prior to SQL Server 2008 for SQL Server 2008 and later Type Bytes datetime 8 smalldatetime 4 Type Bytes date 3 time(n) 3-5 datetime2(n) 6-8 datetimeoffset(n) 8-10 Common Formats Example yyyy-mm-dd 2017-09-22 mm/dd/yyyy 9/22/2017 mm-dd-yy 9-22-17 Month dd, yyyy September 22, 2017 Mon dd, yy Sep 22, 17 dd Mon yy 22 Sep 17 Common Formats Example hh:mi 16:20 hh:mi am/pm 4:20 pm hh:mi:ss 4:20:36 hh:mi:ss:mmm 4:20:36:12 hh:mi:ss.nnnnnnn 4:20:36.1234567 Order of precedence for common data types Precedence Category Data type Highest Date/time datetime smalldatetime Numeric float real decimal money smallmoney int smallint tinyint bit String nvarchar nchar varchar Lowest char When an operator combines two expressions of different data types, the rules for data type precedence specify that the data type with the lower precedence is converted to the data type with the higher precedence. If the conversion is not a supported implicit conversion, an error is returned. When both operand expressions have the same data type, the result of the operation has that data type. Conversions that can’t be done implicitly From data type To data type char, varchar, nchar, nvarchar money, smallmoney datetime, smalldatetime decimal, numeric, float, real, bigint, int, smallint, tinyint, money, smallmoney, bit money, smallmoney char, varchar, nchar, nvarchar Expressions that use implicit conversion InvoiceTotal * .0775 -- InvoiceTotal (money) converted to decimal PaymentTotal – 100 -- Numeric literal (100) converted to money PaymentDate = ‘2021-04-05' -- Date literal converted to smalldatetime value The syntax of the CAST function CAST(expression AS data_type) A SELECT statement that uses the CAST function SELECT InvoiceDate, InvoiceTotal, CAST(InvoiceDate AS varchar) AS varcharDate, CAST(InvoiceTotal AS int) AS integerTotal, CAST(InvoiceTotal AS varchar) AS varcharTotal FROM Invoices; The syntax of the CONVERT function CONVERT(data_type, expression [, style]) Convert and format dates SELECT CONVERT(varchar, InvoiceDate) AS varcharDate, CONVERT(varchar, InvoiceDate, 1) AS varcharDate_1, CONVERT(varchar, InvoiceDate, 107) AS varcharDate_107, CONVERT(varchar, InvoiceTotal) AS varcharTotal, CONVERT(varchar, InvoiceTotal, 1) AS varcharTotal_1 FROM Invoices; CONVERT (TSQL) and CAST (ANSI) are very similar. CONVERT is more flexible in that you can format dates etc. For general purpose, use CAST. Unit 5 - Work with functions Application • Code queries that require any functions for working with string, numeric, and date/time data. Knowledge • Describe how the use of functions can solve the problems associated with (1) sorting string data that contains numeric values, and (2) doing date or time searches. A SELECT statement that formats real numbers SELECT ID, R, CAST(R AS decimal(9,3)) AS R_decimal, CAST(CAST(R AS decimal(9,3)) AS varchar(9)) AS R_varchar, LEN(CAST(CAST(R AS decimal(9,3)) AS varchar(9))) AS R_LEN, SPACE(9 - LEN(CAST(CAST(R AS decimal(9,3)) AS varchar(9)))) + CAST(CAST(R AS decimal(9,3)) AS varchar(9)) AS R_Formatted FROM RealSample; String function examples LEN('SQL Server') 10 LEN(' SQL Server ') 12 LEFT('SQL Server', 3) 'SQL' LTRIM(' SQL Server ') 'SQL Server ' RTRIM(' SQL Server ') ' SQL Server' LTRIM(RTRIM(' SQL Server ')) 'SQL Server' LOWER('SQL Server') 'sql server' UPPER('ny') 'NY' String function examples PATINDEX('%v_r%', 'SQL Server') 8 CHARINDEX('SQL’, ' SQL Server') 3 CHARINDEX('-', '(559) 555-1212') 10 SUBSTRING('(559) 555-1212', 7, 8) 555-1212 REPLACE(RIGHT('(559) 555-1212', 13), ') ', '-') 559-555-1212 CONCAT('Run time: ',1.52,' seconds') Run time: 1.52 seconds Note the space! Can use wildcards No wildcards Starting position of the pattern: ‘v_r’ Why 3 not 1? Examples that use date/time functions Function Result GETDATE() 2017-09-30 14:10:13.813 GETUTCDATE() 2017-09-30 21:10:13.813 SYSDATETIME() 2017-09-30 14:10:13.8160822 SYSUTCDATETIME() 2017-09-30 21:10:13.8160822 SYSDATETIMEOFFSET() 2017-09-30 14:10:13.8160822 -07.00 MONTH('2017-09-30') 9 DATEPART(month,'2017-09-30') 9 DATENAME(month,'2017-09-30') September SELECT statements that ignore time values (continued) • Use the CAST function to remove time values SELECT * FROM DateSample WHERE CAST(CAST(StartDate AS char(11)) AS datetime) = '2011-10-28’; • Use the CONVERT function to remove time values SELECT * FROM DateSample WHERE CONVERT(datetime, CONVERT(char(10), StartDate, 110)) = '2011-10-28’; • The result set The syntax of the searched CASE function CASE WHEN conditional_expression_1 THEN result_expression_1 [WHEN conditional_expression_2 THEN result_expression_2]... [ELSE else_result_expression] END • A SELECT statement with a searched CASE function SELECT InvoiceNumber, InvoiceTotal, InvoiceDate, InvoiceDueDate, CASE WHEN DATEDIFF(day, InvoiceDueDate, GETDATE()) > 30 THEN 'Over 30 days past due’ WHEN DATEDIFF(day, InvoiceDueDate, GETDATE()) > 0 THEN '1 to 30 days past due’ ELSE 'Current’ END AS Status FROM Invoices WHERE InvoiceTotal - PaymentTotal - CreditTotal > 0; The syntax of the COALESCE and ISNULL functions COALESCE(expression_1 [, expression_2]...) • A SELECT statement with a COALESCE function SELECT PaymentDate, COALESCE(PaymentDate, '1900-01-01') AS NewDate FROM Invoices; Returns the first nonnull expression among its arguments Replaces NULL with the specified replacement value. CASE WHEN (exp1 IS NOT NULL) THEN exp1 WHEN (exp2 IS NOT NULL) THEN exp2 ... ELSE expN END ISNULL(check_expression, replacement_value) • The same statement with an ISNULL function SELECT PaymentDate, ISNULL(PaymentDate, '1900-01-01') AS NewDate FROM Invoices; With 2 arguments, same result A query that uses four more functions SELECT SalesYear, RepID, SalesTotal, PERCENT_RANK() OVER (PARTITION BY SalesYear ORDER BY SalesTotal) AS PctRank, CUME_DIST() OVER (PARTITION BY SalesYear ORDER BY SalesTotal) AS CumeDist, PERCENTILE_CONT(.5) WITHIN GROUP (ORDER BY SalesTotal) OVER (PARTITION BY SalesYear) AS PercentileCont, PERCENTILE_DISC(.5) WITHIN GROUP (ORDER BY SalesTotal) OVER (PARTITION BY SalesYear) AS PercentileDisc FROM SalesTotals; Relative rank of a row within a group of rows Cumulative distribution of a value in a group of values A percentile based on a continuous distribution of the column value Unit 6 – Database Design Applied • Given the specifications for a database modeled on a real- world system, design the database. Identify tables, columns, keys, relationships, and indexes for the new database. • Given a diagram for an unnormalized database, normalize the structure to the third normal form. Knowledge • In general terms, describe the criteria for indexing a column. • Explain how referential integrity prevents deletion, insertion, and update anomalies. • Explain how normalizing a database to the third normal form improves database performance. A database system is modeled after a real-world system Mapping People Documents Facilities Other systems Real-world system Database system R ow s Columns Tables The six basic steps for designing a data structure Step 1: Identify the data elements Step 2: Subdivide each element into its smallest useful components Step 3: Identify the tables and assign columns Step 4: Identify the primary and foreign keys Step 5: Review whether the data structure is normalized Step 6: Identify the indexes The relationships between the tables in the accounts payable system Primary Key One-to-Many Relationship Foreign Key Composite Key Anomalies Student Courses Room Jane CSE581 HBCKitt Jim CSE581 HBCKitt Joe CSE581 HBCKitt … … … 1. If every course is in only one room, contains redundant information! 2. If we update the room number for one tuple, we get inconsistent data = an update anomaly 3. If everyone drops the class, we lose what room the class is in! = a delete anomaly 4. We can’t reserve a room without students = an insert anomaly Courses Room CSE581 HBCKitt … … Student Courses Jane CSE581 Jim CSE581 Joe CSE581 … … The first three normal forms Normal form Description First (1NF) The value stored at the intersection of each row and column must be a scalar value, and a table must not contain any repeating columns. Second (2NF) Every non-key column must depend on the entire primary key. Third (3NF) Every non-key column must depend only on the primary key. Third Normal Form • For a table to be in third normal form, every non-key column must depend only on the primary key. • If a column doesn’t depend only on the primary key, it implies that the column is assigned to the wrong table or that it can be computed from other columns in the table. • A column that can be computed from other columns contains derived data. The benefits of normalization • More tables, and each table can have a clustered index, the database has more clustered indexes. • makes data retrieval more efficient • Each table contains information about a single entity, and each index has fewer columns (usually one) and fewer rows. • makes data retrieval and insert, update, and delete operations more efficient • Data redundancy is minimized. • simplifies maintenance and reduces storage The invoice data in first normal form When to denormalize • When a column from a joined table is used repeatedly in search criteria. • If a table is updated infrequently. • Include columns with derived values when those values are used frequently in search conditions. Unit 7 – Create a database Applied • Given a complete database design, write the SQL DDL statements to create the database, including all tables, relationships, constraints, indexes, and sequences. Knowledge • Describe how each of these types of constraints restricts the values that can be stored in a table: NOT NULL, PRIMARY KEY, UNIQUE, CHECK, and FOREIGN KEY (or REFERENCES). • Describe the difference between a column-level constraint and a table-level constraint. • Explain how the CASCADE and NO ACTION options differ in enforcing referential integrity on deletes and updates. • Describe the use of a sequence. • Describe the use of a script that contains one or more batches for creating a database. DDL statements to create, modify, and delete objects • CREATE DATABASE • CREATE TABLE • CREATE INDEX • CREATE SEQUENCE • CREATE FUNCTION • CREATE PROCEDURE • CREATE TRIGGER • CREATE VIEW • ALTER TABLE • ALTER SEQUENCE • ALTER FUNCTION • ALTER PROCEDURE • ALTER TRIGGER • ALTER VIEW • DROP DATABASE • DROP TABLE • DROP SEQUENCE • DROP INDEX • DROP FUNCTION • DROP PROCEDURE • DROP TRIGGER • DROP VIEW Formatting rules for identifiers • The first character of an identifier must be a letter as defined by the Unicode Standard 2.0, an underscore (_), an at sign (@), or a number sign (#). • All characters after the first must be a letter as defined by the Unicode Standard 2.0, a number, an at sign, a dollar sign ($), a number sign, or an underscore. • An identifier can’t be a Transact-SQL reserved keyword. • An identifier can’t contain spaces or special characters other than those already mentioned. Valid regular identifiers • Employees • #PaidInvoices • ABC$123 • Invoice_Line_Items • @TotalDue Valid delimited identifiers • [%Increase] • "Invoice Line Items" • [@TotalDue] Basic syntax of the CREATE DATABASE statement CREATE DATABASE database_name [ON [PRIMARY] (FILENAME = 'file_name')] [FOR ATTACH] Create a new database CREATE DATABASE New_AP; Attach an existing database file CREATE DATABASE Test_AP ON PRIMARY (FILENAME = 'C:\Murach\SQL Server 2019\Databases\Test_AP.mdf') FOR ATTACH; The response from the system Command(s) completed successfully. The response from the system Command(s) completed successfully. Primary database file: Schema and Data Basic syntax of the CREATE TABLE statement CREATE TABLE table_name (column_name_1 data_type [column_attributes] [, column_name_2 data_type [column_attributes]]... [, table_attributes]) Common column attributes • NULL|NOT NULL • PRIMARY KEY|UNIQUE • IDENTITY • DEFAULT default_value • SPARSE • Sparse columns reduce the space requirements for null values at the cost of more overhead to retrieve nonnull values. • Consider using sparse columns when the space saved is at least 20 percent to 40 percent. Values will be allocated to the column [according to the seed and step]. Does not guarantee: Uniqueness; Consecutive values; Reuse of values Common: PRIMARY KEY IDENTITY Clustered index: For faster access The CREATE INDEX statement CREATE [CLUSTERED|NONCLUSTERED] INDEX index_name ON table_name (col_name_1 [ASC|DESC] [, col_name_2 [ASC|DESC]]...) [WHERE filter-condition] Create a nonclustered index on a single column CREATE INDEX IX_VendorID ON Invoices (VendorID); Create a nonclustered index on two columns CREATE INDEX IX_Invoices ON Invoices (InvoiceDate DESC, InvoiceTotal); Note: SQL Server automatically creates a clustered index for a table’s primary key. Default ASC Column-level constraints Constraint Description NOT NULL Prevents null values from being stored in the column. PRIMARY KEY Requires that each row in the table have a unique value in the column. Null values are not allowed. UNIQUE Requires that each row in the table have a unique value in the column. CHECK Limits the values for a column. [FOREIGN KEY] REFERENCES Enforces referential integrity between a column in the new table and a column in a related table. Create a table with a two-column primary key constraint CREATE TABLE InvoiceLineItems1 (InvoiceID INT NOT NULL, InvoiceSequence SMALLINT NOT NULL, InvoiceLineItemAmount MONEY NOT NULL, PRIMARY KEY (InvoiceID, InvoiceSequence)); Create a table with two column-level check constraints CREATE TABLE Invoices1 (InvoiceID INT NOT NULL IDENTITY PRIMARY KEY, InvoiceTotal MONEY NOT NULL CHECK (InvoiceTotal >= 0), PaymentTotal MONEY NOT NULL DEFAULT 0 CHECK (PaymentTotal >= 0)); The same check constraints coded at the table level CREATE TABLE Invoices2 (InvoiceID INT NOT NULL IDENTITY PRIMARY KEY, InvoiceTotal MONEY NOT NULL, PaymentTotal MONEY NOT NULL DEFAULT 0, CHECK ((InvoiceTotal >= 0) AND (PaymentTotal >= 0))); A column-level foreign key constraint A statement that creates the primary key table CREATE TABLE Vendors9 (VendorID INT NOT NULL PRIMARY KEY, VendorName VARCHAR(50) NOT NULL); A statement that creates the foreign key table CREATE TABLE Invoices9 (InvoiceID INT NOT NULL PRIMARY KEY, VendorID INT NOT NULL REFERENCES Vendors9 (VendorID), InvoiceTotal MONEY NULL); Alterations ALTER TABLE Vendors ADD LastTranDate SMALLDATETIME NULL; ALTER TABLE Vendors DROP COLUMN LastTranDate; ALTER TABLE Invoices WITH NOCHECK ADD CHECK (InvoiceTotal >= 1); ALTER TABLE InvoiceLineItems WITH CHECK ADD FOREIGN KEY (AccountNo) REFERENCESGLAccounts(AccountNo); ALTER TABLE InvoiceLineItems ALTER COLUMN InvoiceLineItemDescription VARCHAR(200); Add a new column Drop a column Add a new check constraint Add a foreign key constraint Change the data type of a column Entity-Relationship Model E/R is a visual syntax for database design: • precise enough for technical points • abstracted enough for non-technical people Entity Set: Classes or types of objects in the model Attribute Relationship: a subset of all possible pairs of entities with tuples uniquely identified by P and C’s keys From E/R to relational schema PurchasedProduct name category price Person firstnamedate lastname CREATE TABLE Purchased( name CHAR(50), firstname CHAR(50), lastname CHAR(50), date DATE, PRIMARY KEY (name, firstname, lastname), FOREIGN KEY (name) REFERENCES Product, FOREIGN KEY (firstname, lastname) REFERENCES Person ) Name FirstName LastName Date Wii Jane Smith 10/20/17 Xbox Joe Allan 8/2/16 Purchased A final design ItemInfo PK ItemId Title DateOfRelease FK1 Rating FK2 Genre FK3 ItemType ItemInstance PK,FK1 ItemId PK SerialNumber FK3 Status FK2 MediaType FK4 RentalTermsId FK5 CustomerId ItemRating PK Id Text MediaType PK Id Text FK1 ItemType RentalStatus PK Id Text RentalTerms PK RentalTermsId CostOfRental LengthOfRental FK1 ItemType DailyOverdueCost Actors PK ActorId FirstName LastName ActorsInItems PK,FK1 ActorId PK,FK2 MovieId ItemGenre PK Id Genre FK1 ItemType ItemType PK Id Text Transactions PK TransactionId FK1 CustomerId TransactionTimeStamp TotalPaid ItemsInTransactions PK ItemInTransactionid FK1 SerialNumber FK1 ItemId FK2 RentalTermsId FK3 TransactionId ItemCost Customer PK CustomerId FirstName LastName MembershipNumber Address PK,FK1 CustomerId Street1 Street2 FK2 ZipCode CityAndState PK ZipCode City FK1 State State PK StateId StateName With MS Visio database modeler Objectives - Views Applied • Given a SELECT statement, create a new view based on the statement. • Given a SELECT statement, use the View Designer to create a new view based on the statement. • Use the View Designer to change the design of an existing view. Knowledge • Describe the use of views. • Name the three SQL statements you use to work with views. • Given a SELECT statement, determine whether it can be used as the basis for a view. • Given a SELECT statement, determine whether it can be used as the basis for an updatable view. Objectives (cont.) • Describe the benefits provided by views. • Describe the effects of the WITH SCHEMABINDING and WITH ENCRYPTION clauses on a view. • Describe the effect of the WITH CHECK OPTION clause on an updatable view. • Describe the use of SQL Server’s catalog views for getting information from the system catalog. External schema - views • A set of views over the logical schema, that predicates how users see/access data • It is often not physically materialized, but maintain as a view/query on top of the data. • A “named-query”, or a “virtual-table” • looks like a table and referenced as a table • inserted into/updated, but contains no data • defined by a query on one or more tables • can run SELECT, INSERT, UPDATE and DELETE against a view Create a view and select from it CREATE VIEW MyView1 AS SELECT Column1, Column2 FROM MyTable; CREATE VIEW MyView2 AS SELECT T1.Column1, T1.Column2, T2.Column1 FROM MyTable T1, OtherTable T2 WHERE T1.SomeID = T2.SomeID; SELECT * FROM MyView; Modifying data via views • Not as simple as modifying table data – a view could be combining multiple tables • Restrictions: • cannot DELETE on multi-table view • cannot INSERT on view unless all NOT NULL columns are included in the view • cannot INSERT or UPDATE data if view used DISTINCT • cannot UPDATE a virtual column • if INSERT/UPDATE, all changed records have to belong to a single table Requirements for creating updatable views • The select list can’t include a DISTINCT or TOP clause. • The select list can’t include an aggregate function. • The select list can’t include a calculated value. • The SELECT statement can’t include a GROUP BY or HAVING clause. • The view can’t include the UNION operator. Some of the SQL Server catalog views • sys.schemas (objects: tables, views, procedures, functions, …) • sys.sequences • sys.tables • sys.views • sys.columns • sys.key_constraints • sys.foreign_keys • sys.foreign_key_columns • sys.objects https://docs.microsoft.com/en-us/sql/relational-databases/system-catalog-views/object-catalog-views-transact-sql Unit 8b - Scripts Applied • Given a Transact-SQL script written as a single batch, insert GO commands to divide the script into appropriate batches. • Given the specification for a database problem, write a script that solves it. • Use the SQLCMD utility to execute a query or a script. Knowledge • Describe the use of scripts. • Describe the difference between a scalar variable and a table variable. • Describe the scope of a local variable. • Describe the use of cursors. • Describe the scopes of temporary tables, table variables, and derived tables. • Describe the use of dynamic SQL. • Given a Transact-SQL script, explain what each statement in the script does. The syntax of the USE statement USE database The syntax of the PRINT statement PRINT string_expression A script that uses some of the statements for script processing USE AP; DECLARE @TotalDue money; SET @TotalDue = (SELECT SUM(InvoiceTotal - PaymentTotal - CreditTotal) FROM Invoices); IF @TotalDue > 0 PRINT 'Total invoices due = $' + CONVERT(varchar,@TotalDue,1); ELSE PRINT 'Invoices paid in full'; A SQL script that uses variables USE AP; DECLARE @MaxInvoice money, @MinInvoice money; DECLARE @PercentDifference decimal(8,2); DECLARE @InvoiceCount int, @VendorIDVar int; SET @VendorIDVar = 95; SET @MaxInvoice = (SELECT MAX(InvoiceTotal) FROM Invoices WHERE VendorID = @VendorIDVar); SELECT @MinInvoice = MIN(InvoiceTotal), @InvoiceCount = COUNT(*) FROM Invoices WHERE VendorID = @VendorIDVar; SET @PercentDifference = (@MaxInvoice - @MinInvoice) / @MinInvoice * 100; The syntax of the DECLARE statement for a table variable DECLARE @table_name TABLE (column_name_1 data_type [column_attributes] [, column_name_2 data_type [column_attributes]]... [, table_attributes]) • A table variable can store an entire result set rather than a single value. • Like a scalar variable, a table variable has local scope, available only within the batch where it’s declared. • You can use a table variable like a standard table within SELECT, INSERT, UPDATE and DELETE statements (but not in SELECT INTO). A SQL script that uses a table variable USE AP; DECLARE @BigVendors table (VendorID int, VendorName varchar(50)); INSERT @BigVendors SELECT VendorID, VendorName FROM Vendors WHERE VendorID IN (SELECT VendorID FROM Invoices WHERE InvoiceTotal > 5000); SELECT * FROM @BigVendors; A script that uses a local temporary table SELECT TOP 1 VendorID, AVG(InvoiceTotal) AS AvgInvoice INTO #TopVendors FROM Invoices GROUP BY VendorID ORDER BY AvgInvoice DESC; SELECT Invoices.VendorID, MAX(InvoiceDate) AS LatestInv FROM Invoices JOIN #TopVendors ON Invoices.VendorID = #TopVendors.VendorID GROUP BY Invoices.VendorID; Local temporary table name starts with hash ("#") sign. -- Create Temporary Table CREATE TABLE #Customer ( Id INT, Name VARCHAR(50) ) Create a global temporary table of random numbers CREATE TABLE ##RandomSSNs (SSN_ID int IDENTITY, SSN char(9) DEFAULT LEFT(CAST(CAST(CEILING(RAND()*10000000000) AS bigint) AS varchar),9)); INSERT ##RandomSSNs VALUES (DEFAULT); INSERT ##RandomSSNs VALUES (DEFAULT); SELECT * FROM ##RandomSSNs; Global Temporary tables name starts with a double hash ("##"). • Stored in the system database named tempdb. • Exists only during the current database session. • Useful for testing queries or for storing data temporarily in a complex script. Local vs global temporary tables • Scope of the local temporary table • Is only the session in which it is created and they are dropped automatically once the session ends and we can also drop them explicitly. • If created within a batch, then it can be accessed within the next batch of the same session. • If created within a stored procedure then it can be accessed in it’s child stored procedures, but it can’t be accessed outside the stored procedure. • Scope of global temporary table • Is both the session in which it is created and all other sessions. • Can be dropped explicitly or they will get dropped automatically when the session which created it terminates and none of the other sessions are using it. Temporary Table vs. Table Variable Temporary Table Table Variable Can use DDL statements ALTER, … Can’t modify it nor can drop it explicitly Not allowed in UDF Can be used in UDF Can add indexes explicitly Can have the implicit Indexes which are the result of Primary and Unique Key constraint. Can’t add explicit indexes Can have the implicit indexes which are created as a result of the Primary Key or Unique Key constraint defined during Table Variable declaration Honors the explicit transactions defined by the user. Doesn’t participate in the explicit transactions defined by the user. Derived tables • Standard tables and views are stored permanently on disk until they are explicitly deleted. • Derived tables and table variables are generally stored in memory, so they can provide the best performance. • Standard tables and temporary tables are always stored on disk and therefore provide lower performance. • Use derived tables for better performance instead of a table variable. • Use temporary table only when needed in other batches. SELECT MAX(age) FROM ( -- this part of the query is a derived table: SELECT age FROM table ) as Age -- must give derived table an alias Derived tables are used in the FROM clause Subqueries are used in the WHERE clause The five types of Transact-SQL table objects Type Scope Standard table Available within the system until explicitly deleted. Temporary table Available within the system while the current database session is open. Table variable Available within a script while the current batch is executing. Derived table Available within a statement while the current statement is executing. View Available within the system until explicitly deleted. A script that uses a TRY…CATCH statement BEGIN TRY INSERT Invoices VALUES (799, 'ZXK-799', ‘2021-05-07', 299.95, 0, 0, 1, ‘2021-06-06', NULL); PRINT 'SUCCESS: Record was inserted.'; END TRY BEGIN CATCH PRINT 'FAILURE: Record was not inserted.'; PRINT 'Error ' + CONVERT(varchar, ERROR_NUMBER(), 1) + ': ' + ERROR_MESSAGE(); END CATCH; The message that’s displayed FAILURE: Record was not inserted. Error 547: The INSERT statement conflicted with the FOREIGN KEY constraint "FK_Invoices_Vendors". The conflict occurred in database "AP", table "dbo.Vendors", column 'VendorID'. Unit 9: stored procedures, functions, and triggers Applied • Given the specifications for a database problem, write a stored procedure that solves it. Include data validation when necessary. • Given an expression, write a scalar-valued user-defined function based on the formula or expression. • Given a SELECT statement with a WHERE clause, write a table-valued user-defined function that replaces it. • Given the specifications for a database problem that could be caused by an action query, write a trigger that prevents the problem. • Given the specifications for a database problem that could be caused by a DDL statement, write a trigger that prevents the problem. Unit 9: stored procedures, functions, and triggers Knowledge • Explain why a stored procedure executes faster than an equivalent SQL script. • Describe the basic process for validating data within a stored procedure. • Describe the basic purpose of the system stored procedures. • Describe the two types of user-defined functions. • Describe the two types of triggers. • Describe the effects of the WITH ENCRYPTION and WITH SCHEMABINDING clauses on a stored procedure, user- defined function, or trigger. • Explain why you’d want to use the ALTER statement rather than dropping and recreating a procedure, function, or trigger. • Given a stored procedure, user-defined function, or trigger, explain what each statement does. Stored procedures • The scripts are executed right away • Stored procedures allow us to store data processing and execute later/multiple times • chunks of SQL code • stored in DB data dictionary A comparison of the different types of procedural SQL programs Type Batches How it’s stored How it’s executed Accepts parameters Script Multiple In a file on a disk Within a client tool No Stored procedure One only In an object in the database By a program or within a SQL script Yes User-defined function One only In an object in the database By a program or within a SQL script Yes Trigger One only In an object in the database Automatically by the server when an action query is executed No How to use How stored procedures are used • by SQL programmers to control who accesses the database and how (better access control security) • by application programmers to simplify their use of the database (precompiled performance) How user-defined functions are used • most often used by SQL programmers within the stored procedures and triggers they write • can also be used by application programmers and end users How triggers are used • by SQL programmers to prevent database errors when an action query is executed • by SQL programmers to provide for updatable views Functions vs. Procedures • Functions are very similar to procedures • stored in the DB, precompiled and hold on to some processing • Functions cannot modify data or the DB • SPs can modify data or the DB • Functions have to return exactly 1 value – (scalar or table) • SPs can return 1+ scalars, a table, or nothing • Functions are called within other SQL SELECT SomeFunc(SomeColumn) FROM MyTable; SELECT SomeFunc(SomeColumn), AnotherFunc(AnotherColumn) FROM MyTable; • SPs are called separately at a time using EXEC Functions • Can call a function inside of another function, stored procedure or a regular SQL statement • Regardless, you always use SELECT SELECT @someParam = MyFunc(); SELECT MyFunc(SomeColumn) AS MyData FROM SomeTable; SELECT GETDATE(); -- system function.. Triggers • A special type of an SP • It is executed automatically by DBMS when its preset condition is met • Change to the data INSERT, UPDATE, DELETE • Date or time does not cause triggers to execute Caution: • It is very easy to write a trigger that causes an infinite loop (by causing itself to be executed repeatedly, or carrying out an action that results in the trigger being executed again) • While triggers supports the same SQL language as SPs, they are sometimes more sensitive to the exact syntax The syntax of the RETURN statement for a stored procedure RETURN [integer_expression] A stored procedure that returns a value CREATE PROC spInvCount @DateVar smalldatetime = NULL, @VendorVar varchar(40) = '%' AS IF @DateVar IS NULL SELECT @DateVar = MIN(InvoiceDate) FROM Invoices; DECLARE @InvCount int; SELECT @InvCount = COUNT(InvoiceID) FROM Invoices JOIN Vendors ON Invoices.VendorID = Vendors.VendorID WHERE (InvoiceDate >= @DateVar) AND (VendorName LIKE @VendorVar); RETURN @InvCount; If only a single integer value is returned, use return instead of output parameter If not specified, returns zero A script that calls the procedure BEGIN TRY EXEC spInsertInvoice 799,'ZXK-799’,’2021-05-01',299.95,1,’2021-06-01'; END TRY BEGIN CATCH PRINT 'An error occurred.'; PRINT 'Message: ' + CONVERT(varchar, ERROR_MESSAGE()); IF ERROR_NUMBER() >= 50000 PRINT 'This is a custom error message.'; END CATCH; The response from the system An error occurred. Message: Not a valid VendorID! This is a custom error message. The three types of user-defined functions Function type Description Scalar-valued function Returns a single value of any T-SQL data type. Simple table-valued function Returns a table that’s based on a single SELECT statement. Multi-statement table-valued function Returns a table that’s based on multiple statements. A statement that creates a scalar-valued function CREATE FUNCTION fnVendorID (@VendorName varchar(50)) RETURNS int BEGIN RETURN (SELECT VendorID FROM Vendors WHERE VendorName = @VendorName); END; A statement that invokes the scalar-valued function SELECT InvoiceDate, InvoiceTotal FROM Invoices WHERE VendorID = dbo.fnVendorID('IBM’); A statement that invokes a table-valued function SELECT * FROM dbo.fnTopVendorsDue(5000); Typically, prefix with it with fn Must specify the schema when invoking a UDF No action query against the db (no insert/update/delete) The syntax for creating a scalar-valued function CREATE FUNCTION [schema_name.]function_name ([@parameter_name data_type [= default]] [, ...]) RETURNS data_type [WITH [ENCRYPTION] [, SCHEMABINDING] [, EXECUTE_AS_clause]] [AS] BEGIN [sql_statements] RETURN scalar_expression END A trigger that works with DDL statements CREATE TRIGGER Database_CreateTable_DropTable ON DATABASE AFTER CREATE_TABLE, DROP_TABLE AS DECLARE @EventData xml; SELECT @EventData = EVENTDATA(); DECLARE @EventType varchar(100); SET @EventType = @EventData.value('(/EVENT_INSTANCE/EventType)[1]', 'varchar(100)'); IF @EventType = 'CREATE_TABLE' PRINT 'A new table has been created.'; ELSE PRINT 'A table has been dropped.'; PRINT CONVERT(varchar(max), @EventData); Unit 10 - Transactions Applied • Given a set of statements to be combined into a transaction, insert the Transact-SQL statements to explicitly begin, commit, and roll back the transaction. Knowledge • Describe the use of implicit transactions. • Describe the use of explicit transactions. • Describe the use of the COMMIT TRAN statement and the @@TRANCOUNT function within nested transactions. • Describe the use of save points. • Define these types of concurrency problems: lost updates, dirty reads, nonrepeatable reads, and phantom reads. Unit 10 - Transactions (cont.) • Describe the way locking and the transaction isolation level help to prevent concurrency problems. • Describe the way SQL Server manages locking in terms of granularity, lock escalation, shared locks, exclusive locks, and lock promotion. • Describe deadlocks and the way SQL Server handles them. • Describe four coding techniques that can reduce deadlocks. Transaction flow 1. Begin transaction 2. Do some processing 3. If succeeded Commit changes 4. If failed Rollback changes An example DECLARE @intErrorCode INT BEGIN TRAN UPDATE Authors SET Phone = '415 354-9866' WHERE au_id = '724-80-9391' SELECT @intErrorCode = @@ERROR IF (@intErrorCode <> 0) GOTO PROBLEM UPDATE Publishers SET city = ‘Wichita’, state = ‘Kansas' WHERE pub_id = '9999' SELECT @intErrorCode = @@ERROR IF (@intErrorCode <> 0) GOTO PROBLEM COMMIT TRAN PROBLEM: IF (@intErrorCode <> 0) BEGIN PRINT 'Unexpected error occurred!' ROLLBACK TRAN END Abstraction • Transactions are a programming abstraction that enables the DBMS to handle • Recovery & Durability: Keeping the DBMS data consistent and durable in the face of crashes, aborts, system shutdowns, etc. • Concurrency: Achieving better performance by parallelizing TXNs without creating anomalies • Critical for users, even casual users of data processing systems! Transaction Properties: ACID • Atomic • State shows either all the effects of transaction (commits), or none of them (aborts) • Consistent • Transaction moves from a state where integrity holds, to another where integrity holds (integrity constraints are always satisfied) • Isolated • Effect of transactions is the same as transactions running one after another (looks like batch mode) • Durable • Once a transaction has committed, its effects remain in the database (data is written now on the disk) Recovery and concurrency • In spite of failures: Power failures, but not media failures • Users may abort the program: need to “rollback the changes” • Need to log what happened • Many users executing concurrently • Can be solved via locking When to use explicit transactions • When you code two or more action queries that affect related data • When you update foreign key references • When you move rows from one table to another table • When you code a SELECT query followed by an action query and the values inserted in the action query are based on the results of the SELECT query • When a failure of any set of SQL statements would violate data integrity Pop quiz: Transactions to handle a) Recovery Durability b) C?????? P?????? Possible problems • T1 updates a row. • T2 reads the updated row before T1 commits the update. • If T1 rolls back the change, T2 will have read data that is considered never to have existed. Dirty Reads • T1 reads a row. • T2 updates or deletes that row and commits the update or delete. • If T1 rereads the row, it retrieves different row values or discovers that the row has been deleted. Nonrepeatable Reads • T1 reads a set of rows that satisfy some search criteria. • T2 generates a new row (through either an update or an insert) that matches the search criteria for T1. • If T1 reexecutes the statement that reads the rows, it gets a different set of rows.Phantoms Isolation Levels Transactions are not isolated from each other. They do not adversely affect other transactions. Transactions running at the Read Uncommitted level are usually read-only. Read Uncommitted The transaction waits until rows write-locked by other transactions are unlocked; this prevents it from reading any "dirty" data. Read Committed The transaction holds read locks on all rows it returns to the application and write locks on all rows it inserts, updates, or deletes. (SELECT Read Lock; DELETE Write Lock) Repeatable read The transaction releases its locks when it is committed or rolled back. The transaction holds a read lock or write lock on the range of rows it affects. No insert, update or delete is allowed. Serializable Isolation Levels Transactions are not isolated from each other. They do not adversely affect other transactions. Transactions running at the Read Uncommitted level are usually read-only. Read Uncommitted The transaction waits until rows write-locked by other transactions are unlocked; this prevents it from reading any "dirty" data. Read Committed The transaction holds read locks on all rows it returns to the application and write locks on all rows it inserts, updates, or deletes. (SELECT Read Lock; DELETE Write Lock) Repeatable read The transaction releases its locks when it is committed or rolled back. The transaction holds a read lock or write lock on the range of rows it affects. No insert, update or delete is allowed. Serializable Unit 11: Database Security Applied • Given the specifications for a new user’s security permissions, write the Transact-SQL statements that create the new user and grant the security permissions. • Given the specifications for a new user’s security permissions, use the Management Studio to create the new user and grant the security permissions. • Given the specifications for a set of security permissions for a database, write the Transact-SQL statements to create a new database role and assign users or groups to it. • Given the specifications for a set of security permissions for a server, write the Transact-SQL statements to create a new server role and assign logins to it. • Given the specifications for a set of security permissions, use the Management Studio to create a new role and assign users or groups to it. • Use SQL statements or the Management Studio to add users to the fixed server roles or the fixed database roles. Unit 11: Database Security (cont.) Knowledge • Identify the two ways that SQL Server can authenticate a login ID. • Identify the two SQL Server authentication modes. • Describe these terms: principals and securables. • Describe the difference between an object permission and a database permission. • Describe the guidelines for a strong password. • Describe what a user can do when given any of the standard permissions for a SQL Server object: Select, Update, Insert, Delete, Execute, References, and ALTER. • Describe the difference between a denied permission and a revoked permission. • Describe the two types of fixed roles provided by SQL Server: fixed server roles and fixed database roles. • Describe the use of application roles. Login vs. User Login (server principal) • server level object • has a password • can manipulate server settings but not individual databases • can have multiple users associated with it User (database principal) • database level object • has roles and permissions on objects in a single database How users gain access to a SQL Server database Permissions of fixed server roles OWASP Top 10 Application Security Risks for 2017 • A1: Injection • A2: Broken Authentication • A3: Sensitive Data Exposure • A4: XML External Entities (XXE) • A5: Broken Access Control • A6: Security Misconfiguration • A7: Cross-site Scripting (XSS) • A8: Insecure deserialization • A9: Using components with Known Vulnerabilities • A10: Insufficient Logging and Monitoring A1: Injection • Injection flaws, such as SQL injection, occur when untrusted data is sent to an interpreter as part of a command or query. The attacker’s hostile data can trick the interpreter into executing unintended commands or accessing data without proper authorization. https://www.owasp.org/images/7/72/OWASP_Top_10-2017_%28en%29.pdf.pdf A1: Prevention • Preventing injection requires keeping data separate from commands and queries. • Use a safe API, which avoids the use of the interpreter entirely or provides a parameterized interface, or migrate to use Object Relational Mapping Tools (ORMs). • Even when parameterized, stored procedures can still introduce SQL injection if T-SQL concatenates queries and data, or executes hostile data with EXECUTE IMMEDIATE or exec(). • Use positive or "whitelist" server-side input validation. This is not a complete defense as many applications require special characters, such as text areas or APIs for mobile applications. • For any residual dynamic queries, escape special characters using the specific escape syntax for that interpreter. • SQL structure such as table names, column names, and so on cannot be escaped, and thus user-supplied structure names are dangerous. This is a common issue in report-writing software. • Use LIMIT/TOP and other SQL controls within queries to prevent mass disclosure of records in case of SQL injection. SQL Injection select * from users where userID = ‘” + userFromSite + “’” • if userFromSite is xyz’ OR ‘1’ = ‘1 • What happens? • if userFromSite is xyz’; DROP TABLE users; • What happens? select * from users where userID = ‘xyz’ OR ‘1’ = ‘1’” SELECT ProdName, ProdDesc FROM Prod WHERE ProdNum = ProdNum SELECT ProdName, ProdDesc FROM Prod WHERE ProdNum = 127 SELECT ProdName, ProdDesc FROM Prod WHERE ProdNum = 127 OR 1=1 Expected Maliciously Injected Produced from the browser menu Improving SQL Server Performance 1. Design efficient schemas. 2. Optimize queries. 3. Fine-tune indexes. 4. Perform efficient transactions. 5. Build efficient stored procedures. Others: 1. Analyze and understand execution plans. 2. Identify and eliminate execution plan recompiles. 3. Avoid scalability pitfalls when you use SQL XML. 4. Tune Microsoft SQL Server. 5. Test and monitor your data access performance. 6. Consider how deployment decisions impact performance and scalability. Web technologies https://www.w3schools.com/xml/default.asp Page elements Styles Data formatResponsiveness Customizations HTML JavaScript PHP XML CSS .NETC#Python Unit 12 - XML Applied • Given the specifications for a table that requires an xml data type, create the table and then add and retrieve XML data from the table. • Use the XML Editor to display the XML returned by a query. • Use the XML Editor to create an XML Schema Definition for an XML document. • Use the methods of the xml type to work with XML data. • Given an XML Schema Definition, code a statement that adds it to a database. • Use an XML Schema Definition to provide XML validation for a column or variable. Unit 12 - XML (cont.) Knowledge • Describe the structure and content of an XML document. • Describe the use of an XML schema. • Describe the use of an XQuery. • Describe the use of XML DML. • Describe XML validation. • Describe the basic function of the FOR XML clause of the SELECT statement. • Describe the basic function of the OPENXML statement. A trigger that inserts an XML document into the xml column CREATE TRIGGER Database_CreateTable_DropTable ON DATABASE AFTER CREATE_TABLE, DROP_TABLE AS DECLARE @EventData xml; SELECT @EventData = EVENTDATA(); INSERT INTO DDLActivityLog VALUES (@EventData); A CREATE TABLE statement that fires the trigger CREATE TABLE VendorsTest (VendorID int, VendorName varchar(50)); returns data only when referenced directly inside of a DDL or trigger. E/R, OODB, XML models • Entity/Relationship Data Model • A set of entities and relationships – rows: the instances of the modeled entity – connections: the relationships between them • Object-oriented Data Model (OODM) • Objects contain data and methods represent the relationships • Classes are collections of similar objects with shared structure/behavior • Uses Unified Modeling Language (UML) • Semantic helps clarity, inheritance helps data integrity • Lacking accepted standards, steeper learning curve, runtime inefficiency • XML: Extensible Markup Language • Flexible and extensible • Used for data exchange between disparate platforms and applications • Can be bulky and verbose • Tagged elements, attributes on these elements Data Warehousing • DB Warehouse • Extraction: what data should be extracted from the operational db. • Transformation: additional processing. • Loading: the data is added to the data warehouse. • Data warehousing: • “An integrated, subject-oriented, time-variant, nonvolatile collection of data supporting decision-making” • A read-only source. No updates to existing warehouse data. • Information may be structured, or not. • Purpose is to guide business strategy. Data Warehousing (contd.) • DB Warehouse • Extraction: what data should be extracted from the operational db. • Transformation: additional processing. • Loading: the data is added to the data warehouse. • Data warehousing: • “An integrated, subject-oriented, time-variant, nonvolatile collection of data supporting decision-making” • A read-only source. No updates to existing warehouse data. • Information may be structured, or not. • Purpose is to guide business strategy. Data Analytics • Data mining is the use of mathematical, statistical, and modeling techniques to extract knowledge from data, relationships, characteristics, trends, … • To discover, explain, and predict. Societal impacts Bill of data rights Autonomous Databases Cloud Databases – DBaaS
欢迎咨询51作业君