This is not a comparison in order to decide which database is best or has the most valuable features. It is written from a programmer's point of view who often has to work with a certain database just because it can't be changed in the project. Of course between the lines it is also a cry for standards, but as long as the different vendors don't follow the standards it gives a guideline for workarounds in case of compatibility issues.
ORACLE | MySQL | MS-SQL |
---|---|---|
to_date('28.01.08','dd.mm.yy') |
str_to_date('28.01.08','%d.%m.%y') |
convert(date,'28.01.2008',104) |
'text ' || string || chr(39) | concat('text ',string, char(39)) | 'text ' + string + char(39) |
instr('long ... text', 'needle') | position('needle' in 'long ... text') | patindex('%needle%', 'long...text') |
GROUP BY ROLLUP(author_id) | GROUP BY author_id WITH ROLLUP | GROUP BY ROLLUP (author_id) |
ELSIF | ELSEIF | n.a. (use nested IF ... ELSE) |
NVL(expr,value) | IFNULL(expr,value) | ISNULL(expr,value) |
CREATE PROCEDURE sp_name( p1 IN NUMBER ) IS |
CREATE PROCEDURE sp_name( IN p1 INTEGER ) |
CREATE PROCEDURE sp_name @p1 SMALLINT = 1 AS |
var_a INTEGER; BEGIN | BEGIN DECLARE var_a INTEGER; |
SET NOCOUNT ON BEGIN TRY DECLARE @var_a INT |
DECLARE EXIT HANDLER FOR NOT FOUND SET var_a = 0; |
||
var_a := 55; |
SET var_a = 55; |
SET @var_a = 55; |
SELECT col_a INTO var_a FROM a_table WHERE rownum <= 1; |
SELECT col_a INTO var_a FROM a_table LIMIT 1; |
SELECT TOP(1) @var_a = col_a FROM a_table; (Note: *) |
EXCEPTION WHEN NO_DATA_FOUND THEN var_a := 0; |
END TRY; BEGIN CATCH SET @var_a = 0; |
|
END; | END | END CATCH; |
In contrary to the other database engines T-SQL doesn't produce an error if the select fetches multiple rows. It will walk through the result set and will assign the value of current row/record/cursor to the variable again and again. This can lead to logical errors in your procedure that are difficult to find. Essentially the variable will have the column's value of the last processed row. If you want to make sure that you always get the value of the first row add the TOP(1) after the SELECT as demonstrated above.
Starting with release 11g Oracle offers a aggregation function listagg that is the counter part to group_concat of MySQL.
SQL> SELECT g_ID, min(f_ID), max(f_ID), LISTAGG(f_ID, ',') WITHIN GROUP (ORDER BY f_ID) AS list_of_IDs FROM belong_to GROUP BY g_ID; g_ID min(f_ID) max(f_ID) list_of_IDs ----- --------- --------- ------------------------------ cb 69 78 69,70,71,72,73,74,75,76,77,78 ndf 79 87 79,80,81,82,83,84,85,86,87 nv 60 68 60,61,62,63,64,65,66,67,68 suk 1 9 1,2,3,4,5,6,7,8,9 wes 29 37 29,30,31,32,33,34,35,36,37
The MySQL syntax looks more consistent because all specifications for the concatenation process are inside the parentheses for the parameter of the GROUP_CONCAT function. The additional SEPARATOR keyword seems to be more natural than Oracle's WITHIN GROUP keyword.
mysql> SELECT g_ID, min(f_ID), max(f_ID), GROUP_CONCAT(f_ID ORDER BY f_ID SEPARATOR ',') AS list_of_IDs FROM belong_to GROUP BY g_ID; +------+-----------+-----------+-------------------------------+ | g_ID | min(f_ID) | max(f_ID) | list_of_IDs | +------+-----------+-----------+-------------------------------+ | cb | 69 | 78 | 69,70,71,72,73,74,75,76,77,78 | | ndf | 79 | 87 | 79,80,81,82,83,84,85,86,87 | | nv | 60 | 68 | 60,61,62,63,64,65,66,67,68 | | suk | 1 | 9 | 1,2,3,4,5,6,7,8,9 | | wes | 29 | 37 | 29,30,31,32,33,34,35,36,37 | +------+-----------+-----------+-------------------------------+
PostgreSQL achieves the same in a two step process. There is no aggregation function that returns a concatenated string, but PostgreSQL can return an aggregated array. And as it supports join and split functionality to move from arrays to strings and back you only need to chain the array_agg and the array_to_string functions:
SELECT g_ID, min(f_ID), max(f_ID), array_to_string( array_agg(f_ID), ',') AS list_of_IDs FROM belong_to GROUP BY g_ID;
With PostgreSQL 9 it it possible to use a short form that is similar to what MySQL and Oracle have. Instead of chaining array_agg and array_to_string you can simply use: string_agg(f_ID,','). It is also possible to specify a order by clause to the aggregate expression of array_agg or string_agg:
array_agg( f_ID ORDER BY f_ID DESC ), string_agg(f_ID, ',' ORDER BY f_ID )
Oracle has a specific syntax for hierarchical queries
CONNECT BY
with the pseudo column level. In MS-SQL you need to implement this
with a Common Table Expression that references itself. The concept
is analogous to the linked list structures in C where you already use the name
of the structure while you are still defining the structure. Like in in C you
need to have a kind of declaration in advance and can't use the simple
WITH profile_tree AS (...)
but need to list the fields of the
CTE explicitly.
WITH profile_tree (level, id, owner_id, path_name) AS ( SELECT 1 AS level, p.id, p.owner_id, cast(N'/' COLLATE Latin1_General_CS_AS + p.last_name AS NVARCHAR(60)) AS path_name FROM profile p WHERE p.id = 1 /* start at the root node (usually a parameter) */ UNION ALL SELECT t.level + 1 AS level, p.id, p.owner_id, cast(t.path_name + N'/' + p.last_name AS NVARCHAR(60)) AS path_name FROM profile p INNER JOIN profile_tree t ON t.owner_id = p.id ) /* This is the end of the Common Table Expression */ SELECT level, id, owner_id, path_name FROM profile_tree
PostgreSQL doesn't require a formal table declaration in advance, but instead it needs the special recursive keyword to prepare the parser for the case that it will find a reference to the CTE inside the definition of the CTE. And it is not so picky about the type of the combined pathname column. It can just be VARCHAR, but doesn't need a type cast to fix the length of that field.
WITH RECURSIVE profile_tree AS ( SELECT 1 AS level, p.id, p.owner_id, '/' || p.last_name AS path_name FROM profile p WHERE p.id = 1 /* start at the root node (usually a parameter) */ UNION ALL SELECT t.level + 1 AS level, p.id, p.owner_id, t.path_name || '/' || p.last_name AS path_name FROM profile p INNER JOIN profile_tree t ON t.owner_id = p.id ) SELECT level, id, owner_id, path_name FROM profile_tree
The classical WHERE id IN (select max(id) from ...) to figure out the latest (or highest) record in a table might be inefficient because it runs two queries, the select for the maximum value in the subselect and the outer select. A SELECT ... FROM ... WHERE ... ORDER BY id DESC LIMIT 1 needs only a single pass through the table and should be substantially faster.
But what if you not only want the absolute maximum, but also the relative maximum for each month or each genre? Instead of comparing a scalar value to a one-dimensional list you have to compare a tupel with 2 elements to a 2 x m result set of the subquery.
SELECT b.g_ID, b.f_ID, f.year, f.director FROM belong_to b INNER JOIN films f ON b.f_ID = f.f_ID WHERE (b.g_ID,b.f_ID) IN (SELECT g_ID,max(f_ID) FROM belong_to GROUP BY g_ID); +------+------+------+-------------------+ | g_ID | f_ID | year | director | +------+------+------+-------------------+ | suk | 9 | 1947 | John Huston | | fn | 28 | 1939 | Raoul Walsh | | wes | 37 | 1993 | Detlev Buck | | sc | 59 | 1985 | Martin Scorsese | | nv | 68 | 1965 | Jean-Luc Godard | | ndf | 87 | 1982 | Michael Verhoeven | +------+------+------+-------------------+
This concept works in MySQL, PostgreSQL and in Oracle, but the MS-SQL server does not support such expressions.
The reason why such a tupel comparison is not allowed in MS-SQL might be that the above query could easily be rewritten into a join query:
SELECT b.g_ID, b.f_ID, f.year, f.director FROM belong_to b INNER JOIN films f ON b.f_ID = f.f_ID INNER JOIN (SELECT g_ID,max(f_ID) AS maxid FROM belong_to GROUP BY g_ID) c ON b.g_ID = c.g_ID AND b.f_ID = c.maxid GO g_ID f_ID year director ----- ------- ------ ------------------------- suk 9 1947 John Huston fn 28 1939 Raoul Walsh wes 37 1993 Detlev Buck sc 59 1985 Martin Scorsese nv 68 1965 Jean-Luc Godard ndf 87 1982 Michael Verhoeven
The WHERE ... IN
is usually used when we want to avoid that
rows in one table appear multiple times because the field or fields in the
join condition appear multiple times in the other table. If you have
IN (1,2,1)
you will have only 2 records, but with a join you
will get 3 records. But due to the GROUP BY in this case it is
guaranteed that each combination of tupels appears only once.
But this is the straight forward constellation if the primary key column is used to build the aggregate values. This will be different if we don't want the highest Id, but the latest year.
SELECT x.g_ID, x.f_ID, f.year, f.director, f.title FROM belong_to x INNER JOIN films f ON x.f_ID = f.f_ID INNER JOIN (SELECT x.g_ID, max(x.f_ID) as max_id, max(f.year) AS max_year FROM belong_to x INNER JOIN films f ON x.f_ID = f.f_ID GROUP BY x.g_ID) m ON x.g_ID = m.g_id AND f.year = m.max_year GO g_ID f_ID year director title ----- ------- ------ -------------------- ----------------- suk 7 1949 Jean Cocteau Orphée fn 19 1992 Steven Soderbergh Kafka wes 29 1995 Jim Jarmusch Dead Man sc 51 1995 Woody Allen Mighty Aphrodite nv 66 1994 Gianni Amelio Lamerica cb 78 1995 Danny Boyle Trainspotting ndf 83 1995 Dieter Wedel Der Schattenmann out 90 1995 Wayne Wang Blue in the Face out 102 1995 Wayne Wang Smoke des 111 1993 Robert Altman Short Cuts
If the data is post-processed it is very likely that the follow-up process is not able to handle double entry for genre out. windowing expressions allow us to use two columns for sorting:
SELECT g_ID, f_ID, year, director, title FROM ( SELECT ROW_NUMBER() OVER (PARTITION BY x.g_ID ORDER BY f.year DESC, f.f_ID DESC) row_num, x.g_ID, f.* FROM belong_to x INNER JOIN films f ON x.f_ID = f.f_ID ) src WHERE row_num = 1 ORDER BY 2 GO g_ID f_ID year director title ----- ------- ------ --------------------- ----------------------- suk 7 1949 Jean Cocteau Orphée fn 19 1992 Steven Soderbergh Kafka wes 29 1995 Jim Jarmusch Dead Man sc 51 1995 Woody Allen Mighty Aphrodite nv 66 1994 Gianni Amelio Lamerica cb 78 1995 Danny Boyle Trainspotting ndf 83 1995 Dieter Wedel Der Schattenmann out 102 1995 Wayne Wang Smoke des 111 1993 Robert Altman Short Cuts
Another approach uses CROSS APPLY, but this time the genre out is completely missing from the list:
SELECT g.g_ID, src.f_ID, src.year, src.director, src.title, FROM genres g CROSS APPLY (SELECT TOP(1) b.g_ID, b.f_ID, f.year, f.director, f.title FROM belong_to b INNER JOIN films m ON b.f_ID = f.f_ID WHERE b.g_ID = g.g_ID ORDER BY f.year DESC, b.f_ID DESC) src GO g_ID f_ID year director title ----- ------- ------ --------------------- ------------------------ suk 7 1949 Jean Cocteau Orphée fn 19 1992 Steven Soderbergh Kafka wes 29 1995 Jim Jarmusch Dead Man sc 51 1995 Woody Allen Mighty Aphrodite nv 66 1994 Gianni Amelio Lamerica cb 78 1995 Danny Boyle Trainspotting ndf 83 1995 Dieter Wedel Der Schattenmann des 111 1993 Robert Altman Short Cuts
Oracle implements dynamic SQL with the help of the PL/SQL procedural language. And it supports two types of dynamic SQL:
BEGIN v_stmt := 'SELECT id, title, director FROM dvds WHERE year BETWEEN :x AND :y'; h_cursor := DBMS_SQL.OPEN_CURSOR; DBMS_SQL.PARSE(h_cursor, v_stmt, DBMS_SQL.NATIVE); -- Input variables DBMS_SQL.BIND_VARIABLE(h_cursor, ':x', v_oldest); DBMS_SQL.BIND_VARIABLE(h_cursor, ':y', v_youngest); -- output variables DBMS_SQL.DEFINE_COLUMN(h_cursor, 1, v_idr); DBMS_SQL.DEFINE_COLUMN(h_cursor, 2, v_title, 40); DBMS_SQL.DEFINE_COLUMN(h_cursor, 3, v_director, 30); v_ignore := DBMS_SQL.EXECUTE(h_cursor); LOOP IF DBMS_SQL.FETCH_ROWS(h_cursor) > 0 THEN -- get column values of the row DBMS_SQL.COLUMN_VALUE(h_cursor, 1, v_id); DBMS_SQL.COLUMN_VALUE(h_cursor, 2, v_title); DBMS_SQL.COLUMN_VALUE(h_cursor, 3, v_director); ELSE EXIT; -- No more rows to copy: END IF; END LOOP; DBMS_SQL.CLOSE_CURSOR(h_cursor); END;
Using the OPEN FOR statement and a cursor variable can do the same and is a lot easier (The cursor variable is the type of variable for storing result set like data that we were missing in the above approach):
DECLARE -- Cursor is only needed to easily create a record for the dynamic cursor CURSOR cur_spec IS SELECT id,title,director FROM dvds; rec_movie cur_spec%ROWTYPE; v_cursor SYS_REFCURSOR; BEGIN v_from := 1956; v_until := 2000; v_stmt := 'SELECT id,title,director FROM dvds WHERE year BETWEEN :x AND :y'; OPEN v_cursor FOR v_stmt USING v_from, v_until; LOOP FETCH v_cursor INTO rec_movie; EXIT WHEN v_cursor%NOTFOUND; dbms_output.put_line( 'Movie: ' || rec_movie.title); END LOOP; CLOSE v_cursor; END;
MySQL does this directly in the mysql terminal with the help of session variables:
mysql> set @qry = 'SELECT id, title, director, year, length FROM dvds WHERE year BETWEEN ? AND ?'; Query OK, 0 rows affected (0.02 sec) mysql> PREPARE dynsql FROM @qry; Query OK, 0 rows affected (0.01 sec) Statement prepared mysql> set @oldest = 1967; mysql> set @youngest = 1976; mysql> EXECUTE dynsql USING @oldest,@youngest; +----+----------------------------+----------------------+------+--------+ | id | title | director | year | length | +----+----------------------------+----------------------+------+--------+ | 24 | Spiel mir das Lied vom Tod | Sergio Leone | 1968 | 159 | | 27 | Badlands | Terence Malick | 1973 | 90 | | 58 | The Wild Bunch | Sam Peckinpah | 1969 | 139 | | 71 | Hundstage | Sidney Lumet | 1975 | 119 | +----+----------------------------+----------------------+------+--------+ 4 rows in set (0.00 sec) mysql> set @youngest = 1971; mysql> EXECUTE dynsql USING @oldest,@youngest; mysql> DEALLOCATE PREPARE dynsql;
(see also Information for MySQL Migrations: Triggers and Stored Procedures)
Oracle's PL/SQL has a fairly complete set of methods to pass structured data to a procedure: You can either use a cursor variable or a collection of records (known as PL/SQL index-by table) for result set like data. Or you can just pass a record variable for "one row" values. This is handy if you have to specify a lot of parameters for the procedure. You don't need to declare every single parameter (which turns out to be a maintenance nightmare if you have more then 5 or 6 parameters) but you can hide the single parameters as attributes (or members) inside the record variable. You won't need to change the signature of your procedure or function and neither the parameter list at the places where you call the procedure. You only need to add another member to the record definition and recompile your code.
MS-SQL tends to be a nightmare when you need to create a procedure that
depends on several attributes as parameters. In many cases these are some of
the values of a table row. Passing the table row as a record is not possible.
Instead you have to declare a variable for each column value of the table row,
that you want to pass as parameter.
Table variables make your life easier in such a case. But because
MS-SQL doesn't have the concept of packages you can't declare the
record, cursor or table variable close to the procedure in the same package,
but you have to do it on database level:
CREATE TYPE dbo.param_tab AS TABLE ( some_id BIGINT NOT NULL, name NVARCHAR(255), a_date DATETIME ) CREATE PROCEDURE process_the_data ( @params dbo.param_tab READONLY ) AS BEGIN SELECT TOP(1) ... FROM @params p INNER JOIN other_table o ON p.some_id = o.id WHERE .... END
The attribute READONLY is required and it shows that in contrary to Oracle's cursor or record variables the table variable can not be changed.
A similar inconvenience appears in MS SQL when you fetch from a cursor.
You can't fetch into a record variable, but have to declare a variable with
the appropriate type for each field that is selected in the cursor.
Maybe this inconvenience is on purpose to remind the people that cursor and
loops are procedural thinking and against the spirit of SQL.
DECLARE @x INT = 1956, @y INT = 2000 DECLARE cur_movies CURSOR FOR SELECT id, title, director, year FROM dvds WHERE year BETWEEN @x AND @y BEGIN DECLARE @aValue INT DECLARE @id INT, @year INT, @title NVARCHAR(255), @director NVARCHAR(250) OPEN cur_movies FETCH NEXT FROM cur_movies INTO @id, @title, @director, @year WHILE @@FETCH_STATUS = 0 BEGIN -- do something with the variables that were filled by the cursor row SET @aValue = @year + @id FETCH NEXT FROM cur_movies INTO @id, @title, @director, @year END CLOSE cur_movies DEALLOCATE cur_movies END
The @@FETCH_STATUS
cursor attribute variable is set by the
last FETCH NEXT
. If FETCH NEXT
would return a
boolean variable it could be used directly as condition for the WHILE.
But because it is only writing into and modifying the global pseudo variable
fetching and testing the result of the fetch is splitted into two statements
and hence we need the FETCH statement twice: one time before the while and
another time at the end of the WHILE's execution block.
And sadly each time we have to give the full list of variables after the INTO.
Better don't ask me how often I have forgotten to add the new variable also
to the second FETCH NEXT
statement ...
TABLE(...)
. PostgreSQL knows about functions that return
a set of records (where the type for these records have to be
predefined as a custom type) or a TABLE(..colspec..) where the
returned composite type is specified in the function signature and can
return n x m structured data.
Both database engines treat such kind of functions as if they were views and
consequently you can select from them like
SELECT * FROM my_func(23)
PostgreSQL has a very similar concept but consequently because the body of
the function doesn't contain any procedural stuff, the language
plpgsql
is not needed and SQL
as language is good
enough. The following function generates some basic DDL for the available
tables with a query against the information_schema. The sizes, that
are passed as parameter for formatting of the generated SQL have to go deeply
inside the subselect, it is nearly impossible to do this with a simple view.
CREATE OR REPLACE FUNCTION re_create_tables(p_name_size integer, p_colspec_size integer) RETURNS TABLE(table_spec text) AS $$ SELECT concat(a.tab_name, a.column_name, a.column_spec) AS table_spec FROM (SELECT CASE ordinal_position WHEN 1 THEN concat('CREATE TABLE ', table_name, ' (', chr(10)) ELSE '' END as tab_name, table_name, ordinal_position, rpad(column_name,p_name_size) as column_name, rpad(concat(CASE upper(data_type) WHEN 'CHARACTER VARYING' THEN 'VARCHAR' ELSE upper(data_type) END, CASE WHEN upper(data_type) IN ('CHARACTER VARYING') THEN format('(%s)',character_maximum_length) WHEN upper(data_type) IN ('BIGINT','INTEGER','SMALLINT') THEN '' WHEN upper(data_type) IN ('NUMERIC','DECIMAL') THEN format('(%s,%s)',numeric_precision,numeric_scale) ELSE '' END, CASE is_nullable WHEN 'NO' THEN ' NOT NULL,' ELSE ',' END),p_colspec_size) AS column_spec FROM information_schema.columns WHERE table_schema = 'public') a ORDER BY table_name, ordinal_position $$ LANGUAGE SQL; /* Another function that uses plpgsql as language (which is the wrong language here) */ CREATE OR REPLACE FUNCTION get_film(p_pattern VARCHAR) RETURNS TABLE ( film_title VARCHAR, film_year INT ) AS $$ BEGIN RETURN QUERY SELECT title, cast(release_year as integer) FROM film WHERE title LIKE p_pattern; END; $$ LANGUAGE 'plpgsql';
Oracle and PostgreSQL allow you to pass around cursor variables (that can even be scrollable and are much cleaner as the MS-Sql and MySql approach with simply selecting in the output stream inside stored procedures.
SQL 92 | Oracle | MySQL | PostgreSQL | MS-SQL |
---|---|---|---|---|
coalesce() | Yes since 8i, before nvl() | Yes, since 3.23.3 | Yes | Yes, since 2008, before isnull() |
case ... when | 9i Yes, before decode() | Yes | Yes | Yes |
extract(.. from ..) | only a few unit specifiers | Yes | Yes | No, use DATEPART(year, date_expr) |
position(.. in ..) | No, use instr() | Yes, as a subset of locate() | Yes | No, use patindex() or charindex() |
auto increment | no (needs sequence/trigger) | INT AUTO_INCREMENT | (BIG)SERIAL | IDENTITY |
table(nested table) | unnest(array-type) | |||
INTERSECT | Yes | No, use INNER join (a) | Yes | Yes |
EXCEPT | No, use MINUS | No, use OUTER join (b) | Yes | Yes |
-- (a) SELECT DISTINCT a.year AS common_years FROM films a INNER JOIN films b ON a.year = b.year INNER JOIN belong_to ab ON a.f_ID = ab.f_ID INNER JOIN belong_to bb ON b.f_ID = bb.f_ID WHERE ab.g_ID = 'wes' AND bb.g_ID = 'fn' +--------------+ | common_years | +--------------+ | 1992 | +--------------+ -- (b) SELECT DISTINCT a.year AS exclusive_years FROM films a INNER JOIN belong_to ab ON a.f_ID = ab.f_ID LEFT JOIN (SELECT b.year FROM films b INNER JOIN belong_to bb ON b.f_ID = bb.f_ID WHERE bb.g_ID = 'wes') n ON a.year = n.year WHERE ab.g_ID = 'fn' AND n.year is null +-----------------+ | exclusive_years | +-----------------+ | 1995 | | 1993 | | 1948 | | 1943 | | 1954 | | 1957 | | 1950 | +-----------------+
There are two approaches for this problem:
Oracle and PostgreSQL both support the RETURNING INTO
clause to
get hold of the last used ID value. Actually this syntax can also serve for
other purposes and not only for auto increment values.
INSERT INTO persons (firstname,lastname) VALUES ('Peter','Smith') RETURNING the_id INTO var_id;MS-SQL uses a different syntax for the same concept:
INSERT INTO persons (firstname,lastname) OUTPUT inserted.the_id VALUES ('Peter','Smith')
The MS-SQL commands creates a result set. This means in the host language you have to treat this insert just like you would do with a select statement. For Oracle and PostgreSQL you have to prepare a output variable that you need to bind to a specific host variable (or of course in PL/SQL and PL/pgSQL a variable declaration in the surrounding block is enough).
For all database engines the listed function have a scope of the active session. A parallel insert from a second session won't modify the returned value. So they are safe, except if the insert fires one or more triggers, that are also using auto increment values. Of course the additional inserts made by the triggers are in the same session (they are even in the same transaction) and will affect the returned value of these functions.
LAST_INSERT_ID()
Function for handling this in the
database and an analogous function mysql_insert_id()
if you
need to retrieve this value from a host language like C, Java or Perl.SCOPE_IDENTITY()
function ignores the modification by
triggers in contrary to the older method with a server variable
@@IDENTITY
. And surprise, MS-SQL returns the ID of the
last inserted row in case of a multiple row insert.LASTVAL()
The MS-SQL method BULK INSERT is somewhere located between Oracle's external tables and the MySQL approach load data [local] infile. Both BULK INSERT and external tables need the datafiles to be located on the server's file system. MySQL is not limited to this if you use the optional keyword local. But the syntax and concept of BULK INSERT are very similar to load data infile as both need an existing "real" table in which the data will be inserted.
But even if you can do it with MySQL ... using a file on the clients file system might be against the spirit of these bulk loading tools. Because it obviously means that the data has to travel somehow over the network in order to get into the database and this includes quite a lot of network protocol overhead.
BULK INSERT an_existing_table FROM 'C:\path\on\the\server\file.csv' WITH ( DATAFILETYPE = 'char', -- single byte encoding CODEPAGE = 'ACP', -- Ansi Code Page (windows-1252) FIELDTERMINATOR = ';' ROWTERMINATOR = '\n' )
LOAD DATA LOCAL INFILE '/home/rolf/data/voicemail-calls.csv' INTO TABLE voicemail FIELDS TERMINATED BY ';' (id,calling_no,called_no,calling_time,week_day,isdn_cause,proto_err);
BULK INSERT
has a cousin OPENROWSET .. BULK
that
works more like the external tables
of Oracle. It reads the flat datafile every time when it is accessed and keeps
the data in memory.
SELECT m.empid, m.firstname, m.lastname FROM OPENROWSET( BULK 'C:\path\on\the\server\file.csv', FORMATFILE = 'C:\path\on\the\server\employess.fmt') AS m;
Both Oracle and MS-SQL have a TRUNCATE TABLE
statement. But
there is a crucial difference between them. In Oracle it is a DDL statement and
requires the DROP TABLE
privilege and as all DDL statements it
executes an implicit COMMIT
and breaks any transaction border of
the surrounding statements.
The MS-SQL server treats a TRUNCATE TABLE
statement as a
super DELETE
that can't have a filter criterion and
collects no row-based transaction information. But if the TRUNCATE
is inside a transaction there is still the chance to revert the effects of the
TRUNCATE
later on with a ROLLBACK
.
Although many users won't like it, because they have learned it the hard
way and lost their data when using TRUNCATE
in an Oracle database.
Oracle's approach is more consistent, because the keyword TABLE
should alert you that you are not performing normal data manipulation, but
are executing some data definition command.
update test_data set label=description, description = label;
The above statement should simply exchange the data in label with the values in description. This works for PostgreSQL, MS-SQL and ORACLE, but it doesn't work for MySQL. When MySQL is evaluating the description = label it doesn't use the original input value of label, but the one after processing the assignment of description to label. So finally both columns label and description will have the same value from the original description column.
Luckily most of the time you need either min, max or sum as aggregate functions which all behave as expected. But if you use avg in MS SQL you will look at the data and think that the values look strange and wrong.
SELECT avg(n) AS average FROM (VALUES (0),(1),(1),(1)) AS num(n) average ----------- 0
On second sight you will realize that the SQL-Server works with integer values and that he even applies integer arithmetic when dividing the sum of all through the number of rows. So the correct way to build the average, would require a type cast to float or decimal first:
SELECT avg(cast(n as float)) AS average FROM (VALUES (0),(1),(1),(1)) AS num(n) average ------------------------ 0.75