金曜日, 11月 06, 2009

[HIVE] select出来た!

cygwinだと、HIVEからhadoopを呼び出すところで失敗する。なのでこのためにはhiveconfオプションにて、hadoopのまるママの実行文を書いておくべし。

00962724@XXX /cygdrive/d/work/apache-hive/build/dist/conf
$ /cygdrive/d/work/apache-hive/build/dist/bin/hive -hiveconf hadoop.bin.path="bash /cygdrive/d/work/apache-hadoop-0.19.2/bin/hadoop" -e "SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';"

また、以下のようなエラーがでるときはmetadataが壊れていた様子。

Job Submission failed with exception 'java.lang.NullPointerException(null)'

"drop table"して(このコマンドも失敗したけど、その後"show tables"してテーブルが無いようなら"create table"してloadすれば、うまく動くようになった!

Hive history file=/tmp/00962724/hive_job_log_00962724_200911060205_2037141967.txt
Total MapReduce jobs = 1
Number of reduce tasks is set to 0 since there's no reduce operator
09/11/06 02:05:32 INFO exec.ExecDriver: Number of reduce tasks is set to 0 since there's no reduce operator
09/11/06 02:05:32 INFO exec.ExecDriver: Using org.apache.hadoop.hive.ql.io.HiveInputFormat
09/11/06 02:05:32 INFO exec.ExecDriver: Processing alias a
09/11/06 02:05:32 INFO exec.ExecDriver: Adding input file file:/user/hive/warehouse/invites/ds=2008-08-15
09/11/06 02:05:34 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
09/11/06 02:05:34 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
09/11/06 02:05:34 INFO mapred.FileInputFormat: Total input paths to process : 1
Job running in-process (local Hadoop)
09/11/06 02:05:35 INFO exec.ExecDriver: Job running in-process (local Hadoop)
09/11/06 02:05:35 INFO mapred.FileInputFormat: Total input paths to process : 1
09/11/06 02:05:35 INFO mapred.MapTask: numReduceTasks: 0
09/11/06 02:05:35 INFO ExecMapper: maximum memory = 1040515072
09/11/06 02:05:35 INFO ExecMapper: conf classpath = [file:/D:/tmp/hadoop-00962724/hadoop-unjar3921453707640367881/, file:/D:/work/apache-hive/build/dist/lib/hive_exec.jar, file:/D:/tmp/hadoop-00962724/hadoop-unjar3921453707640367881/classes]
09/11/06 02:05:35 INFO ExecMapper: thread classpath = [file:/D:/tmp/hadoop-00962724/hadoop-unjar3921453707640367881/, file:/D:/work/apache-hive/build/dist/lib/hive_exec.jar, file:/D:/tmp/hadoop-00962724/hadoop-unjar3921453707640367881/classes]
09/11/06 02:05:35 INFO exec.MapOperator: Adding alias a to work list for file /user/hive/warehouse/invites/ds=2008-08-15/kv2.txt
09/11/06 02:05:35 INFO exec.MapOperator: dump TS struct<foo:int,bar:string,ds:string>
09/11/06 02:05:35 INFO ExecMapper:
<MAP>Id =10
  <Children>
    <TS>Id =0
      <Children>
        <FIL>Id =1
          <Children>
            <FIL>Id =2
              <Children>
                <SEL>Id =3
                  <Children>
                    <FS>Id =4
                      <Parent>Id = 3 <\Parent>
                    <\FS>
                  <\Children>
                  <Parent>Id = 2 <\Parent>
                <\SEL>
              <\Children>
              <Parent>Id = 1 <\Parent>
            <\FIL>
          <\Children>
          <Parent>Id = 0 <\Parent>
        <\FIL>
      <\Children>
      <Parent>Id = 10 <\Parent>
    <\TS>
  <\Children>
<\MAP>
09/11/06 02:05:35 INFO exec.MapOperator: Initializing Self 10 MAP
09/11/06 02:05:35 INFO exec.TableScanOperator: Initializing Self 0 TS
09/11/06 02:05:35 INFO exec.TableScanOperator: Operator 0 TS initialized
09/11/06 02:05:35 INFO exec.TableScanOperator: Initializing children of 0 TS
09/11/06 02:05:35 INFO exec.FilterOperator: Initializing child 1 FIL
09/11/06 02:05:35 INFO exec.FilterOperator: Initializing Self 1 FIL
09/11/06 02:05:35 INFO exec.FilterOperator: Operator 1 FIL initialized
09/11/06 02:05:35 INFO exec.FilterOperator: Initializing children of 1 FIL
09/11/06 02:05:35 INFO exec.FilterOperator: Initializing child 2 FIL
09/11/06 02:05:35 INFO exec.FilterOperator: Initializing Self 2 FIL
09/11/06 02:05:35 INFO exec.FilterOperator: Operator 2 FIL initialized
09/11/06 02:05:35 INFO exec.FilterOperator: Initializing children of 2 FIL
09/11/06 02:05:35 INFO exec.SelectOperator: Initializing child 3 SEL
09/11/06 02:05:35 INFO exec.SelectOperator: Initializing Self 3 SEL
09/11/06 02:05:35 INFO exec.SelectOperator: SELECT struct<foo:int,bar:string,ds:string>
09/11/06 02:05:35 INFO exec.SelectOperator: Operator 3 SEL initialized
09/11/06 02:05:35 INFO exec.SelectOperator: Initializing children of 3 SEL
09/11/06 02:05:35 INFO exec.FileSinkOperator: Initializing child 4 FS
09/11/06 02:05:35 INFO exec.FileSinkOperator: Initializing Self 4 FS
09/11/06 02:05:35 INFO exec.FileSinkOperator: Writing to temp file: FS file:/tmp/hive-00962724/1005059804/_tmp.10001/_tmp.attempt_local_0001_m_000000_0
09/11/06 02:05:35 INFO exec.FileSinkOperator: Operator 4 FS initialized
09/11/06 02:05:35 INFO exec.FileSinkOperator: Initialization Done 4 FS
09/11/06 02:05:35 INFO exec.SelectOperator: Initialization Done 3 SEL
09/11/06 02:05:35 INFO exec.FilterOperator: Initialization Done 2 FIL
09/11/06 02:05:35 INFO exec.FilterOperator: Initialization Done 1 FIL
09/11/06 02:05:35 INFO exec.TableScanOperator: Initialization Done 0 TS
09/11/06 02:05:35 INFO exec.MapOperator: Initialization Done 10 MAP
09/11/06 02:05:35 INFO exec.MapOperator: 10 forwarding 1 rows
09/11/06 02:05:35 INFO exec.TableScanOperator: 0 forwarding 1 rows
09/11/06 02:05:35 INFO exec.FilterOperator: 1 forwarding 1 rows
09/11/06 02:05:35 INFO exec.FilterOperator: 2 forwarding 1 rows
09/11/06 02:05:35 INFO exec.SelectOperator: 3 forwarding 1 rows
09/11/06 02:05:35 INFO ExecMapper: ExecMapper: processing 1 rows: used memory = 2369824
09/11/06 02:05:35 INFO exec.MapOperator: 10 forwarding 10 rows
09/11/06 02:05:35 INFO exec.TableScanOperator: 0 forwarding 10 rows
09/11/06 02:05:35 INFO exec.FilterOperator: 1 forwarding 10 rows
09/11/06 02:05:35 INFO exec.FilterOperator: 2 forwarding 10 rows
09/11/06 02:05:35 INFO exec.SelectOperator: 3 forwarding 10 rows
09/11/06 02:05:35 INFO ExecMapper: ExecMapper: processing 10 rows: used memory = 2388056
09/11/06 02:05:35 INFO exec.MapOperator: 10 forwarding 100 rows
09/11/06 02:05:35 INFO exec.TableScanOperator: 0 forwarding 100 rows
09/11/06 02:05:35 INFO exec.FilterOperator: 1 forwarding 100 rows
09/11/06 02:05:35 INFO exec.FilterOperator: 2 forwarding 100 rows
09/11/06 02:05:35 INFO exec.SelectOperator: 3 forwarding 100 rows
09/11/06 02:05:35 INFO ExecMapper: ExecMapper: processing 100 rows: used memory = 2388056
09/11/06 02:05:36 INFO exec.MapOperator: 10 finished. closing...
09/11/06 02:05:36 INFO exec.MapOperator: 10 forwarded 500 rows
09/11/06 02:05:36 INFO exec.MapOperator: DESERIALIZE_ERRORS:0
09/11/06 02:05:36 INFO exec.TableScanOperator: 0 finished. closing...
09/11/06 02:05:36 INFO exec.TableScanOperator: 0 forwarded 500 rows
09/11/06 02:05:36 INFO exec.FilterOperator: 1 finished. closing...
09/11/06 02:05:36 INFO exec.FilterOperator: 1 forwarded 500 rows
09/11/06 02:05:36 INFO exec.FilterOperator: PASSED:500
09/11/06 02:05:36 INFO exec.FilterOperator: FILTERED:0
09/11/06 02:05:36 INFO exec.FilterOperator: 2 finished. closing...
09/11/06 02:05:36 INFO exec.FilterOperator: 2 forwarded 500 rows
09/11/06 02:05:36 INFO exec.FilterOperator: PASSED:500
09/11/06 02:05:36 INFO exec.FilterOperator: FILTERED:0
09/11/06 02:05:36 INFO exec.SelectOperator: 3 finished. closing...
09/11/06 02:05:36 INFO exec.SelectOperator: 3 forwarded 500 rows
09/11/06 02:05:36 INFO exec.FileSinkOperator: 4 finished. closing...
09/11/06 02:05:36 INFO exec.FileSinkOperator: 4 forwarded 0 rows
09/11/06 02:05:36 INFO exec.FileSinkOperator: Committed to output file: file:/tmp/hive-00962724/1005059804/_tmp.10001/attempt_local_0001_m_000000_0
09/11/06 02:05:36 INFO exec.SelectOperator: 3 Close done
09/11/06 02:05:36 INFO exec.FilterOperator: 2 Close done
09/11/06 02:05:36 INFO exec.FilterOperator: 1 Close done
09/11/06 02:05:36 INFO exec.TableScanOperator: 0 Close done
09/11/06 02:05:36 INFO exec.MapOperator: 10 Close done
09/11/06 02:05:36 INFO ExecMapper: ExecMapper: processed 500 rows: used memory = 2459320
09/11/06 02:05:36 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
09/11/06 02:05:36 INFO mapred.LocalJobRunner:
09/11/06 02:05:36 INFO mapred.TaskRunner: Task attempt_local_0001_m_000000_0 is allowed to commit now
09/11/06 02:05:36 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/hive-00962724/384798627
09/11/06 02:05:36 INFO mapred.LocalJobRunner: file:/user/hive/warehouse/invites/ds=2008-08-15/kv2.txt:0+5791
09/11/06 02:05:36 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
2009-11-06 02:05:36,156 map = 100%,  reduce = 0%
09/11/06 02:05:36 INFO exec.ExecDriver: 2009-11-06 02:05:36,156 map = 100%,  reduce = 0%
Ended Job = job_local_0001
09/11/06 02:05:36 INFO exec.ExecDriver: Ended Job = job_local_0001
09/11/06 02:05:36 INFO exec.FileSinkOperator: Moving tmp dir: file:/tmp/hive-00962724/1005059804/_tmp.10001 to: file:/tmp/hive-00962724/1005059804/_tmp.10001.intermediate
09/11/06 02:05:36 INFO exec.FileSinkOperator: Moving tmp dir: file:/tmp/hive-00962724/1005059804/_tmp.10001.intermediate to: file:/tmp/hive-00962724/1005059804/10001
OK
Time taken: 27.86 seconds

00962724@XXX /cygdrive/d/work/apache-hive/build/dist/conf