参考文章
- https://blog.csdn.net/Xiblade/article/details/82318294
- https://stackoverflow.com/questions/40603714/hive-is-not-allowed-to-impersonate-hive
- https://pypi.org/project/impyla/
远程连接hive,大致有如下几种方法:
- 使用beeline(基于SQLLine CLI的JDBC客户端),使用方法比较简单,但需要先安装hive套餐,且beeline的JDBC版本要与目标服务器的一致。
- 用JDBC API写Java或Scala程序。
- 使用Python模块pyhive或impyla或pyhs2。
在此我仅分享使用pyhive和impyla连接hive的方法。
1.安装依赖
很多报错都是由于没有装齐这些依赖导致的。
sudo apt-get update #centos用yum -y update
sudo apt-get install gcc #centos用yum install gcc
sudo apt-get install python-dev #centos用yum install python-dev
sudo apt-get install libsasl2-dev #centos用yum install cyrus-sasl-devel
2.Python安装
推荐anaconda,简单方便,版本随意,以下仅示例。
wget https://repo.anaconda.com/archive/Anaconda3-5.3.0-Linux-x86_64.sh
bash Anaconda3-5.3.0-Linux-x86_64.sh
#然后根据提示操作
source ~/.bashrc
3.相关模块安装
pyhive与impyla依赖的模块相同,但版本不同,因此需要选择其一使用,建议pyhive。
pyhive相关模块安装
pip install --upgrade pip
pip install pure-sasl
pip install thrift_sasl #如已安老版本则pip install --upgrade thrift_sasl
pip install thrift #如已安老版本则pip install --upgrade thrift
pip install pyhive
impyla相关模块安装
pip install --upgrade pip
pip install pure-sasl
pip install thrift_sasl==0.2.1 #就要这个版本,新版本可能报错
pip install thrift==0.9.3 #建议0.9.3,不过我后来升最新版本也没出问题
pip install impyla
提醒一点,注意环境变量,别装错了地方。
4.使用方法简介
pyhive可以配置hive参数,比如我要写超长sql,可以在Connection里加configuration,而imlyla没有找到解决办法。
pyhive使用方法
from pyhive import hive
conn = hive.Connection(host='localhost', port=10000, username='账号', password='密码', database='default', auth='LDAP', configuration={'mapreduce.map.java.opts': '-Xss20m'})#configuration非必填
cursor = conn.cursor()
cursor.execute('show tables')
result = cursor.fetchall()
impyla使用方法
from impala.dbapi import connect
conn = connect(host='localhost', port=10000, auth_mechanism='PLAIN', user='账号', password='密码', database='default')
cursor = conn.cursor()
cursor.execute('show databases') #sql
result = cursor.fetchall()
#文本格式输出结果(上面是数组格式)
from impala.util import as_pandas
cursor.execute('show databases')
print(as_pandas(cursor))
5.备注
在执行
from pyhive import hive
时,如报如下错误
ImportError: cannot import name 'constants'
则退出python,更新pyhive
pip install --upgrade pyhive
在执行
from impala.dbapi import connect
时,如报类似如下错误
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/anaconda3/lib/python3.7/site-packages/impala/dbapi.py", line 28, in <module>
import impala.hiveserver2 as hs2
File "/root/anaconda3/lib/python3.7/site-packages/impala/hiveserver2.py", line 340
async=True)
^
SyntaxError: invalid syntax
则退出python,执行如下代码
conda install -c anaconda impyla
以下报错不出意外的话通过上述方法都可搞定。
thriftpy.protocol.exc.TProtocolException: TProtocolException(type=4)
Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found
AttributeError: 'TSocket' object has no attribute 'isOpen'
fatal error: sasl/sasl.h: No such file or director
error: command 'gcc' failed with exit status 1
Can’t connect to unsecured hive. SASL error: TTransportException: Could not start SASL: Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2
TTransportException: Could not start SASL: Error in sasl_client_start (-4) SASL( -4): no mechanism available: Unable to find a callback: 2