HIVE语法
0.数据类型 TINYINT - byte SMALLINT - short INT - int BIGINT - long BOOLEAN - boolean FLOAT - float DOUBLE - double STRING - String TIMESTAMP - TimeStamp BINARY - byte[] 1.create table CREATE TABLE 创建一个指定名字的表。如果相同名字的表已经存在,则抛出异常;用户可以用 IF NOT EXIST 选项来忽略这个异常。 EXTERNAL 关键字可以让用户创建一个外部表,在建表的同时指定一个指向实际数据的路径(LOCATION),Hive 创建内部表时,会将数据移动到数据仓库指向的路径;若创建外部表,仅记录数据所在的路径,不对数据的位置做任何改变。在删除表的时候,内部表的元数据和数据会被一起删除,而外部表只删除元数据,不删除数据。 LIKE 允许用户复制现有的表结构,但是不复制数据。 有分区的表可以在创建的时候使用 PARTITIONED BY 语句。一个表可以拥有一个或者多个分区,每一个分区单独存在一个目录下。 CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [ [ROW FORMAT row_format] [STORED AS file_format] | STORED BY 'storage.handler.class.name' [ WITH SERDEPROPERTIES (...) ] (Note: only available starting with 0.6.0) ] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)] (Note: only available starting with 0.6.0) [AS select_statement] (Note: this feature is only available starting with 0.5.0.) CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name LIKE existing_table_name [LOCATION hdfs_path] data_type : primitive_type | array_type | map_type | struct_type primitive_type : TINYINT | SMALLINT | INT | BIGINT | BOOLEAN | FLOAT | DOUBLE | STRING array_type : ARRAY < data_type > map_type : MAP < primitive_type, data_type > struct_type : STRUCT < col_name : data_type [COMMENT col_comment], ...> row_format : DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)] file_format: : SEQUENCEFILE | TEXTFILE | RCFILE (Note: only available starting with 0.6.0) | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname 练习: 创建一张内部表 创建一张外部表 创建一张带有分区的外部表 2.Alter Table (1)Add Partitions ALTER TABLE table_name ADD [IF NOT EXISTS] partition_spec [ LOCATION 'location1' ] partition_spec [ LOCATION 'location2' ] ... partition_spec: : PARTITION (partition_col = partition_col_value, partition_col = partiton_col_value, ...) 练习:修改表增加分区 (2)Drop Partitions ALTER TABLE table_name DROP partition_spec, partition_spec,... (3)Rename Table ALTER TABLE table_name RENAME TO new_table_name (4)Change Column ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name] 这个命令可以允许改变列名、数据类型、注释、列位置或者它们的任意组合 (5)Add/Replace Columns ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...) ADD是代表新增一字段,字段位置在所有列后面(partition列前);REPLACE则是表示替换表中所有字段。 3.Show 查看表名 SHOW TABLES; 查看表名,部分匹配 SHOW TABLES 'page.*'; SHOW TABLES '.*view'; 查看某表的所有Partition,如果没有就报错: SHOW PARTITIONS page_view; 查看某表结构: DESCRIBE invites; 查看分区内容 SELECT a.foo FROM invites a WHERE a.ds='2008-08-15'; 查看有限行内容,同Greenplum,用limit关键词 SELECT a.foo FROM invites a limit 3; 查看表分区定义 DESCRIBE EXTENDED page_view PARTITION (ds='2008-08-08'); 4.Load LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] Load 操作只是单纯的复制/移动操作,将数据文件移动到 Hive 表对应的位置。 5.Insert (1)Inserting data into Hive Tables from queries INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement (2)Writing data into filesystem from queries INSERT OVERWRITE [LOCAL] DIRECTORY directory1 SELECT ... FROM ... 6.Drop 删除一个内部表的同时会同时删除表的元数据和数据。删除一个外部表,只删除元数据而保留数据。 7.Limit Limit 可以限制查询的记录数。查询的结果是随机选择的 8.Select SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM table_reference [WHERE where_condition] [GROUP BY col_list] [ CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY col_list] ] [LIMIT number] 9.JOIN join_table: table_reference JOIN table_factor [join_condition] | table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference join_condition | table_reference LEFT SEMI JOIN table_reference join_condition table_reference: table_factor | join_table table_factor: tbl_name [alias] | table_subquery alias | ( table_references ) join_condition: ON equality_expression ( AND equality_expression )* equality_expression: expression = expression