Skip to content

hive 数据迁移

生成数据导出脚本

hive -e "set hive.cli.print.header=false;use ta;show tables" | awk '{printf "export table ta.%s to \47/data_trans/ta/%s\47;\n",$1,$1}' > export_ta.sql;

执行导出脚本

hive  -f  export_ta.sql

传递数据

网络联通

hadoop distcp -skipcrccheck -update hdfs://master001:9000/test hdfs://master002:9000/test -skipcrccheck 省略crc检查,如果hadoop版本一样,可以不加,如果低版本向高版本迁移的话,最好带上 -update 增量更新,通过文件名称,大小比较,源文件和目标文件不同则更新

网络不连通

hdfs dfs -get /tmp/hive-export/ scp -r hive-export/ export.sql master002 hdfs dfs -put hive-export/ /tmp/hive-export

hadoop distcp hdfs://localhost:9000/data_trans/ta hdfs://master-1-1.c-516899f543f5228f.cn-beijing.emr.aliyuncs.com:9000/user/hive/warehouse/gala.db

生成数据导入脚本

cp export_ta.sql import_ta.sql
sed -i 's/export table/import table/g' import_ta.sql
sed -i 's/ to / from /g' import_ta.sql

导入数据

hive -f import.sql

hive迁移数据,重新建表load 分区之后查询报错字段对不上

https://stackoverflow.com/questions/59647638/presto-query-error-on-hive-orc-can-not-read-sql-type-real-from-orc-stream-of-ty

https://github.com/prestodb/presto/issues/15497

Presto query error on hive ORC, Can not read SQL type real from ORC stream of type DOUBLE

I was able to run query in presto to read the non-float columns from Hive ORC(snappy) table. However, when I select all float datatype columns through the presto cli, gives me the below error message. Any suggestions what is the alternative other than changing the filed type to double in the targetHive table

presto:sample> select * from emp_detail;

Query 20200107_112537_00009_2zpay failed: Error opening Hive split hdfs://ip_address/warehouse/tablespace/managed/hive/sample.db/emp_detail/part-00079-5b0c6005-0943-4181-951f-43bcfcfe741f-c000.snappy.orc (offset=0, length=1999857): Malformed ORC file. Can not read SQL type real from ORC stream .salary of type DOUBLE [hdfs://ip_address/warehouse/tablespace/managed/hive/sample.db/emp_detail/part-00079-5b0c6005-0943-4181-951f-43bcfcfe741f-c000.snappy.orc]

解答

Please try to add this property

hive.orc.use-column-names=true

to presto-server/conf/catalog/hive.properties, and restart your presto server.

To test it without restarting the server run this from presto-cli

SET SESSION hive.orc_use_column_names=true;

Release notes from Presto regarding these attribute.

文章来源于自己总结和网络转载,内容如有任何问题,请大佬斧正!联系我