Ivy Default Cache set to: /home/hadoop/.ivy2/cache The jars for the packages stored in: /home/hadoop/.ivy2/jars io.acryl#datahub-spark-lineage added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-5d07e8d1-de9e-44e0-8ffd-c0ade936ca98;1.0 confs: [default] found io.acryl#datahub-spark-lineage;0.12.0 in central downloading https://repo1.maven.org/maven2/io/acryl/datahub-spark-lineage/0.12.0/datahub-spark-lineage-0.12.0.jar ... [SUCCESSFUL ] io.acryl#datahub-spark-lineage;0.12.0!datahub-spark-lineage.jar (761ms) :: resolution report :: resolve 586ms :: artifacts dl 764ms :: modules in use: io.acryl#datahub-spark-lineage;0.12.0 from central in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 1 | 1 | 1 | 0 || 1 | 1 | --------------------------------------------------------------------- :: retrieving :: org.apache.spark#spark-submit-parent-5d07e8d1-de9e-44e0-8ffd-c0ade936ca98 confs: [default] 1 artifacts copied, 0 already retrieved (45700kB/44ms) 23/11/16 17:54:03 INFO SparkContext: Running Spark version 3.2.1-amzn-0 23/11/16 17:54:03 INFO ResourceUtils: ============================================================== 23/11/16 17:54:03 INFO ResourceUtils: No custom resources configured for spark.driver. 23/11/16 17:54:03 INFO ResourceUtils: ============================================================== 23/11/16 17:54:03 INFO SparkContext: Submitted application: derivateTable 23/11/16 17:54:03 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 4, script: , vendor: , memory -> name: memory, amount: 14336, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 23/11/16 17:54:03 INFO ResourceProfile: Limiting resource is cpus at 4 tasks per executor 23/11/16 17:54:03 INFO ResourceProfileManager: Added ResourceProfile id: 0 23/11/16 17:54:04 INFO SecurityManager: Changing view acls to: hadoop 23/11/16 17:54:04 INFO SecurityManager: Changing modify acls to: hadoop 23/11/16 17:54:04 INFO SecurityManager: Changing view acls groups to: 23/11/16 17:54:04 INFO SecurityManager: Changing modify acls groups to: 23/11/16 17:54:04 INFO SecurityManager: SecurityManager: authentication enabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set() 23/11/16 17:54:04 INFO Utils: Successfully started service 'sparkDriver' on port 33253. 23/11/16 17:54:04 INFO SparkEnv: Registering MapOutputTracker 23/11/16 17:54:04 INFO SparkEnv: Registering BlockManagerMaster 23/11/16 17:54:04 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 23/11/16 17:54:04 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 23/11/16 17:54:04 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 23/11/16 17:54:04 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-874eea25-c6d6-4735-8238-5b091a0447bd 23/11/16 17:54:04 INFO MemoryStore: MemoryStore started with capacity 7.3 GiB 23/11/16 17:54:04 INFO SparkEnv: Registering OutputCommitCoordinator 23/11/16 17:54:04 INFO SubResultCacheManager: Sub-result caches are disabled. 23/11/16 17:54:04 INFO log: Logging initialized @6107ms to org.sparkproject.jetty.util.log.Slf4jLog 23/11/16 17:54:04 INFO Server: jetty-9.4.43.v20210629; built: 2021-06-30T11:07:22.254Z; git: 526006ecfa3af7f1a27ef3a288e2bef7ea9dd7e8; jvm 1.8.0_382-b05 23/11/16 17:54:04 INFO Server: Started @6191ms 23/11/16 17:54:04 INFO AbstractConnector: Started ServerConnector@3de8376e{HTTP/1.1, (http/1.1)}{0.0.0.0:4040} 23/11/16 17:54:04 INFO Utils: Successfully started service 'SparkUI' on port 4040. 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@22967631{/jobs,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@778d6658{/jobs/json,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6cd1d6cb{/jobs/job,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@4c588d6c{/jobs/job/json,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5b642cd7{/stages,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5f5c1a3d{/stages/json,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@12e74c0a{/stages/stage,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@7a354ef9{/stages/stage/json,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@56ae7da7{/stages/pool,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@25f71d7{/stages/pool/json,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@47d354bf{/storage,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1f6ce16{/storage/json,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1450ce89{/storage/rdd,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2df9f0f0{/storage/rdd/json,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2ec8e34f{/environment,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@161873d4{/environment/json,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@31a55521{/executors,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@33bbc348{/executors/json,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2501f4{/executors/threadDump,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@4f7791f4{/executors/threadDump/json,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6e518548{/static,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@779098de{/,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@90989b8{/api,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@28dcaf1a{/jobs/job/kill,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@146f4750{/stages/stage/kill,null,AVAILABLE,@Spark} 23/11/16 17:54:04 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://[2600:1f18:148c:9803:d18e:5421:a449:d0f4]:4040 23/11/16 17:54:04 INFO SparkContext: Added JAR file:///home/hadoop/.ivy2/jars/io.acryl_datahub-spark-lineage-0.12.0.jar at spark://[2600:1f18:148c:9803:d18e:5421:a449:d0f4]:33253/jars/io.acryl_datahub-spark-lineage-0.12.0.jar with timestamp 1700157243897 23/11/16 17:54:04 INFO SparkContext: Added file file:///home/hadoop/.ivy2/jars/io.acryl_datahub-spark-lineage-0.12.0.jar at spark://[2600:1f18:148c:9803:d18e:5421:a449:d0f4]:33253/files/io.acryl_datahub-spark-lineage-0.12.0.jar with timestamp 1700157243897 23/11/16 17:54:04 INFO Utils: Copying /home/hadoop/.ivy2/jars/io.acryl_datahub-spark-lineage-0.12.0.jar to /tmp/spark-869daa7a-25c1-4e02-afeb-38b77bd824db/userFiles-f475fc94-c932-43a2-9b5d-fd9e66c81671/io.acryl_datahub-spark-lineage-0.12.0.jar 23/11/16 17:54:05 INFO Utils: Using initial executors = 3, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances 23/11/16 17:54:05 INFO ExecutorContainerAllocator: Set total expected execs to {0=3} 23/11/16 17:54:05 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38709. 23/11/16 17:54:05 INFO NettyBlockTransferService: Server created on [2600:1f18:148c:9803:d18e:5421:a449:d0f4]:38709 23/11/16 17:54:05 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 23/11/16 17:54:05 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, [2600:1f18:148c:9803:d18e:5421:a449:d0f4], 38709, None) 23/11/16 17:54:05 INFO BlockManagerMasterEndpoint: Registering block manager [2600:1f18:148c:9803:d18e:5421:a449:d0f4]:38709 with 7.3 GiB RAM, BlockManagerId(driver, [2600:1f18:148c:9803:d18e:5421:a449:d0f4], 38709, None) 23/11/16 17:54:05 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, [2600:1f18:148c:9803:d18e:5421:a449:d0f4], 38709, None) 23/11/16 17:54:05 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, [2600:1f18:148c:9803:d18e:5421:a449:d0f4], 38709, None) 23/11/16 17:54:05 INFO ExecutorContainerAllocator: Going to request 3 executors for ResourceProfile Id: 0, target: 3 already provisioned: 0. 23/11/16 17:54:05 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@550f7fa5{/metrics/json,null,AVAILABLE,@Spark} 23/11/16 17:54:05 INFO TimeBasedRotatingEventLogFilesWriter: rotationIntervalInSeconds = 300, eventFileMinSize = 1048576, maxFilesToRetain = 2 23/11/16 17:54:05 INFO DefaultEmrServerlessRMClient: Creating containers with container role SPARK_EXECUTOR and keys: Set(1, 2, 3) 23/11/16 17:54:05 INFO TimeBasedRotatingEventLogFilesWriter: Logging events to file:/var/log/spark/apps/eventlog_v2_00fepu9m0fhn7o0a/00fepu9m0fhn7o0a.inprogress 23/11/16 17:54:05 INFO Utils: Using initial executors = 3, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances 23/11/16 17:54:05 WARN ExecutorAllocationManager: Dynamic allocation without a shuffle service is an experimental feature. 23/11/16 17:54:05 INFO ExecutorContainerAllocator: Set total expected execs to {0=3} SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 23/11/16 17:54:05 INFO SparkContext: Registered listener datahub.spark.DatahubSparkListener 23/11/16 17:54:05 INFO DefaultEmrServerlessRMClient: Containers created with container role SPARK_EXECUTOR. key to container id map: Map(2 -> 52c5eca1-ef05-738a-7e36-5595b2471d6e, 1 -> 04c5eca1-eefb-e0cf-b158-4ad294d693ff, 3 -> c4c5eca1-ef0e-d9a8-9d0b-874ed52dd68f) 23/11/16 17:54:08 INFO EmrServerlessClusterSchedulerBackend$EmrServerlessDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (2600:1f18:148c:9803:974e:ce19:89f9:751d:41270) with ID 2, ResourceProfileId 0 23/11/16 17:54:08 INFO ExecutorMonitor: New executor 2 has registered (new total is 1) 23/11/16 17:54:09 INFO BlockManagerMasterEndpoint: Registering block manager [2600:1f18:148c:9803:974e:ce19:89f9:751d]:44435 with 7.9 GiB RAM, BlockManagerId(2, [2600:1f18:148c:9803:974e:ce19:89f9:751d], 44435, None) 23/11/16 17:54:09 INFO EmrServerlessClusterSchedulerBackend$EmrServerlessDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (2600:1f18:148c:9803:fae6:705:86f8:a499:49228) with ID 3, ResourceProfileId 0 23/11/16 17:54:09 INFO ExecutorMonitor: New executor 3 has registered (new total is 2) 23/11/16 17:54:09 INFO BlockManagerMasterEndpoint: Registering block manager [2600:1f18:148c:9803:fae6:705:86f8:a499]:40915 with 7.9 GiB RAM, BlockManagerId(3, [2600:1f18:148c:9803:fae6:705:86f8:a499], 40915, None) 23/11/16 17:54:10 INFO EmrServerlessClusterSchedulerBackend$EmrServerlessDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (2600:1f18:148c:9803:56f0:3cc8:7c93:8cfc:39388) with ID 1, ResourceProfileId 0 23/11/16 17:54:10 INFO ExecutorMonitor: New executor 1 has registered (new total is 3) 23/11/16 17:54:10 INFO EmrServerlessClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 23/11/16 17:54:10 INFO BlockManagerMasterEndpoint: Registering block manager [2600:1f18:148c:9803:56f0:3cc8:7c93:8cfc]:40779 with 7.9 GiB RAM, BlockManagerId(1, [2600:1f18:148c:9803:56f0:3cc8:7c93:8cfc], 40779, None) 23/11/16 17:54:10 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir. 23/11/16 17:54:10 INFO SharedState: Warehouse path is 'file:/home/hadoop/spark-warehouse'. 23/11/16 17:54:10 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@55d4159c{/SQL,null,AVAILABLE,@Spark} 23/11/16 17:54:10 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1d532b41{/SQL/json,null,AVAILABLE,@Spark} 23/11/16 17:54:10 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@654b2255{/SQL/execution,null,AVAILABLE,@Spark} 23/11/16 17:54:10 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2357245d{/SQL/execution/json,null,AVAILABLE,@Spark} 23/11/16 17:54:10 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1ea6a30f{/static/sql,null,AVAILABLE,@Spark} 23/11/16 17:54:12 INFO HiveConf: Found configuration file file:/etc/spark/conf/hive-site.xml 23/11/16 17:54:12 INFO HiveUtils: Initializing HiveMetastoreConnection version 2.3.9-amzn-2 using Spark classes. 23/11/16 17:54:12 INFO HiveConf: Found configuration file file:/etc/spark/conf/hive-site.xml 23/11/16 17:54:12 INFO HiveClientImpl: Warehouse location for Hive client (version 2.3.9) is file:/home/hadoop/spark-warehouse 23/11/16 17:54:12 INFO AWSGlueClientFactory: Setting region to : us-east-1 23/11/16 17:54:14 INFO InMemoryFileIndex: It took 79 ms to list leaf files for 1 paths. 23/11/16 17:54:16 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'. 23/11/16 17:54:17 INFO FileSourceStrategy: Pushed Filters: 23/11/16 17:54:17 INFO FileSourceStrategy: Post-Scan Filters: 23/11/16 17:54:17 INFO FileSourceStrategy: Output Data Schema: struct<job_type: string, job_source: string, avs_version: string, gtr_version: string, jenkins_gtr_build_url: string ... 594 more fields> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 23/11/16 17:54:18 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 579.6 KiB, free 7.3 GiB) 23/11/16 17:54:18 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 48.8 KiB, free 7.3 GiB) 23/11/16 17:54:18 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on [2600:1f18:148c:9803:d18e:5421:a449:d0f4]:38709 (size: 48.8 KiB, free: 7.3 GiB) 23/11/16 17:54:18 INFO SparkContext: Created broadcast 0 from showString at NativeMethodAccessorImpl.java:0 23/11/16 17:54:18 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes, number of split files: 1, prefetch: true 23/11/16 17:54:18 INFO FileSourceScanExec: relation: Some(`tmp_drew`.`final_attempt`), fileSplitsInPartitionHistogram: Vector((1 fileSplits,1)) 23/11/16 17:54:18 INFO SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:0 23/11/16 17:54:18 INFO DAGScheduler: Got job 0 (showString at NativeMethodAccessorImpl.java:0) with 1 output partitions 23/11/16 17:54:18 INFO DAGScheduler: Final stage: ResultStage 0 (showString at NativeMethodAccessorImpl.java:0) 23/11/16 17:54:18 INFO DAGScheduler: Parents of final stage: List() 23/11/16 17:54:18 INFO DAGScheduler: Missing parents: List() 23/11/16 17:54:18 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at showString at NativeMethodAccessorImpl.java:0), which has no missing parents 23/11/16 17:54:18 INFO ExecutorContainerAllocator: Set total expected execs to {0=1} 23/11/16 17:54:18 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 176.5 KiB, free 7.3 GiB) 23/11/16 17:54:18 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 43.0 KiB, free 7.3 GiB) 23/11/16 17:54:18 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on [2600:1f18:148c:9803:d18e:5421:a449:d0f4]:38709 (size: 43.0 KiB, free: 7.3 GiB) 23/11/16 17:54:18 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1518 23/11/16 17:54:18 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at showString at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0)) 23/11/16 17:54:18 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 0 23/11/16 17:54:18 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) ([2600:1f18:148c:9803:56f0:3cc8:7c93:8cfc], executor 1, partition 0, ANY, 5086 bytes) taskResourceAssignments Map() 23/11/16 17:54:19 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on [2600:1f18:148c:9803:56f0:3cc8:7c93:8cfc]:40779 (size: 43.0 KiB, free: 7.9 GiB) 23/11/16 17:54:19 INFO AsyncEventQueue: Process of event SparkListenerSQLExecutionStart(0,showString at NativeMethodAccessorImpl.java:0,org.apache.spark.sql.Dataset.showString(Dataset.scala:328) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:498) py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) py4j.Gateway.invoke(Gateway.java:282) py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) py4j.commands.CallCommand.execute(CallCommand.java:79) py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) py4j.ClientServerConnection.run(ClientServerConnection.java:106) java.lang.Thread.run(Thread.java:750),== Parsed Logical Plan == GlobalLimit 21 +- LocalLimit 21 +- Project [cast(job_type#0 as string) AS job_type#1788, cast(job_source#1 as string) AS job_source#1789, cast(avs_version#2 as string) AS avs_version#1790, cast(gtr_version#3 as string) AS gtr_version#1791, cast(jenkins_gtr_build_url#4 as string) AS jenkins_gtr_build_url#1792, cast(jenkins_gtr_build_id#5 as string) AS jenkins_gtr_build_id#1793, cast(jenkins_avs_build_url#6 as string) AS jenkins_avs_build_url#1794, cast(jenkins_avs_build_id#7 as string) AS jenkins_avs_build_id#1795, cast(is_gtr_mr#8 as string) AS is_gtr_mr#1796, cast(is_avs_mr#9 as string) AS is_avs_mr#1797, cast(gtr_mr_id#10 as string) AS gtr_mr_id#1798, cast(avs_mr_id#11 as string) AS avs_mr_id#1799, cast(submit_time#12 as string) AS submit_time#2382, cast(last_modified_time#13 as string) AS last_modified_time#2383, cast(baseline_job_id#14 as string) AS baseline_job_id#1800, cast(n_tasks#15L as string) AS n_tasks#1801, cast(n_tasks_success#16L as string) AS n_tasks_success#1802, cast(scenario_hash#17 as string) AS scenario_hash#1803, cast(scenario_hash_simple#18 as string) AS scenario_hash_simple#1804, cast(n_scenarios#19L as string) AS n_scenarios#1805, cast(classification_precision#20 as string) AS classification_precision#1806, cast(classification_recall#21 as string) AS classification_recall#1807, cast(classification_f1#22 as string) AS classification_f1#1808, cast(classification_wf1#23 as string) AS classification_wf1#1809, ... 572 more fields] +- Project [job_type#0, job_source#1, avs_version#2, gtr_version#3, jenkins_gtr_build_url#4, jenkins_gtr_build_id#5, jenkins_avs_build_url#6, jenkins_avs_build_id#7, is_gtr_mr#8, is_avs_mr#9, gtr_mr_id#10, avs_mr_id#11, submit_time#12, last_modified_time#13, baseline_job_id#14, n_tasks#15L, n_tasks_success#16L, scenario_hash#17, scenario_hash_simple#18, n_scenarios#19L, classification_precision#20, classification_recall#21, classification_f1#22, classification_wf1#23, ... 572 more fields] +- SubqueryAlias spark_catalog.tmp_drew.final_attempt +- Relation tmp_drew.final_attempt[job_type#0,job_source#1,avs_version#2,gtr_version#3,jenkins_gtr_build_url#4,jenkins_gtr_build_id#5,jenkins_avs_build_url#6,jenkins_avs_build_id#7,is_gtr_mr#8,is_avs_mr#9,gtr_mr_id#10,avs_mr_id#11,submit_time#12,last_modified_time#13,baseline_job_id#14,n_tasks#15L,n_tasks_success#16L,scenario_hash#17,scenario_hash_simple#18,n_scenarios#19L,classification_precision#20,classification_recall#21,classification_f1#22,classification_wf1#23,... 572 more fields] parquet == Analyzed Logical Plan == job_type: string, job_source: string, avs_version: string, gtr_version: string, jenkins_gtr_build_url: string, jenkins_gtr_build_id: string, jenkins_avs_build_url: string, jenkins_avs_build_id: string, is_gtr_mr: string, is_avs_mr: string, gtr_mr_id: string, avs_mr_id: string, submit_time: string, last_modified_time: string, baseline_job_id: string, n_tasks: string, n_tasks_success: string, scenario_hash: string, scenario_hash_simple: string, n_scenarios: string, classification_precision: string, classification_recall: string, classification_f1: string, classification_wf1: string, ... 572 more fields GlobalLimit 21 +- LocalLimit 21 +- Project [cast(job_type#0 as string) AS job_type#1788, cast(job_source#1 as string) AS job_source#1789, cast(avs_version#2 as string) AS avs_version#1790, cast(gtr_version#3 as string) AS gtr_version#1791, cast(jenkins_gtr_build_url#4 as string) AS jenkins_gtr_build_url#1792, cast(jenkins_gtr_build_id#5 as string) AS jenkins_gtr_build_id#1793, cast(jenkins_avs_build_url#6 as string) AS jenkins_avs_build_url#1794, cast(jenkins_avs_build_id#7 as string) AS jenkins_avs_build_id#1795, cast(is_gtr_mr#8 as string) AS is_gtr_mr#1796, cast(is_avs_mr#9 as string) AS is_avs_mr#1797, cast(gtr_mr_id#10 as string) AS gtr_mr_id#1798, cast(avs_mr_id#11 as string) AS avs_mr_id#1799, cast(submit_time#12 as string) AS submit_time#2382, cast(last_modified_time#13 as string) AS last_modified_time#2383, cast(baseline_job_id#14 as string) AS baseline_job_id#1800, cast(n_tasks#15L as string) AS n_tasks#1801, cast(n_tasks_success#16L as string) AS n_tasks_success#1802, cast(scenario_hash#17 as string) AS scenario_hash#1803, cast(scenario_hash_simple#18 as string) AS scenario_hash_simple#1804, cast(n_scenarios#19L as string) AS n_scenarios#1805, cast(classification_precision#20 as string) AS classification_precision#1806, cast(classification_recall#21 as string) AS classification_recall#1807, cast(classification_f1#22 as string) AS classification_f1#1808, cast(classification_wf1#23 as string) AS classification_wf1#1809, ... 572 more fields] +- Project [job_type#0, job_source#1, avs_version#2, gtr_version#3, jenkins_gtr_build_url#4, jenkins_gtr_build_id#5, jenkins_avs_build_url#6, jenkins_avs_build_id#7, is_gtr_mr#8, is_avs_mr#9, gtr_mr_id#10, avs_mr_id#11, submit_time#12, last_modified_time#13, baseline_job_id#14, n_tasks#15L, n_tasks_success#16L, scenario_hash#17, scenario_hash_simple#18, n_scenarios#19L, classification_precision#20, classification_recall#21, classification_f1#22, classification_wf1#23, ... 572 more fields] +- SubqueryAlias spark_catalog.tmp_drew.final_attempt +- Relation tmp_drew.final_attempt[job_type#0,job_source#1,avs_version#2,gtr_version#3,jenkins_gtr_build_url#4,jenkins_gtr_build_id#5,jenkins_avs_build_url#6,jenkins_avs_build_id#7,is_gtr_mr#8,is_avs_mr#9,gtr_mr_id#10,avs_mr_id#11,submit_time#12,last_modified_time#13,baseline_job_id#14,n_tasks#15L,n_tasks_success#16L,scenario_hash#17,scenario_hash_simple#18,n_scenarios#19L,classification_precision#20,classification_recall#21,classification_f1#22,classification_wf1#23,... 572 more fields] parquet == Optimized Logical Plan == GlobalLimit 21 +- LocalLimit 21 +- Project [job_type#0, job_source#1, avs_version#2, gtr_version#3, jenkins_gtr_build_url#4, jenkins_gtr_build_id#5, jenkins_avs_build_url#6, jenkins_avs_build_id#7, cast(is_gtr_mr#8 as string) AS is_gtr_mr#1796, cast(is_avs_mr#9 as string) AS is_avs_mr#1797, gtr_mr_id#10, avs_mr_id#11, cast(submit_time#12 as string) AS submit_time#2382, cast(last_modified_time#13 as string) AS last_modified_time#2383, baseline_job_id#14, cast(n_tasks#15L as string) AS n_tasks#1801, cast(n_tasks_success#16L as string) AS n_tasks_success#1802, scenario_hash#17, scenario_hash_simple#18, cast(n_scenarios#19L as string) AS n_scenarios#1805, cast(classification_precision#20 as string) AS classification_precision#1806, cast(classification_recall#21 as string) AS classification_recall#1807, cast(classification_f1#22 as string) AS classification_f1#1808, cast(classification_wf1#23 as string) AS classification_wf1#1809, ... 572 more fields] +- Relation tmp_drew.final_attempt[job_type#0,job_source#1,avs_version#2,gtr_version#3,jenkins_gtr_build_url#4,jenkins_gtr_build_id#5,jenkins_avs_build_url#6,jenkins_avs_build_id#7,is_gtr_mr#8,is_avs_mr#9,gtr_mr_id#10,avs_mr_id#11,submit_time#12,last_modified_time#13,baseline_job_id#14,n_tasks#15L,n_tasks_success#16L,scenario_hash#17,scenario_hash_simple#18,n_scenarios#19L,classification_precision#20,classification_recall#21,classification_f1#22,classification_wf1#23,... 572 more fields] parquet == Physical Plan == CollectLimit 21 +- Project [job_type#0, job_source#1, avs_version#2, gtr_version#3, jenkins_gtr_build_url#4, jenkins_gtr_build_id#5, jenkins_avs_build_url#6, jenkins_avs_build_id#7, cast(is_gtr_mr#8 as string) AS is_gtr_mr#1796, cast(is_avs_mr#9 as string) AS is_avs_mr#1797, gtr_mr_id#10, avs_mr_id#11, cast(submit_time#12 as string) AS submit_time#2382, cast(last_modified_time#13 as string) AS last_modified_time#2383, baseline_job_id#14, cast(n_tasks#15L as string) AS n_tasks#1801, cast(n_tasks_success#16L as string) AS n_tasks_success#1802, scenario_hash#17, scenario_hash_simple#18, cast(n_scenarios#19L as string) AS n_scenarios#1805, cast(classification_precision#20 as string) AS classification_precision#1806, cast(classification_recall#21 as string) AS classification_recall#1807, cast(classification_f1#22 as string) AS classification_f1#1808, cast(classification_wf1#23 as string) AS classification_wf1#1809, ... 572 more fields] +- FileScan parquet tmp_drew.final_attempt[job_type#0,job_source#1,avs_version#2,gtr_version#3,jenkins_gtr_build_url#4,jenkins_gtr_build_id#5,jenkins_avs_build_url#6,jenkins_avs_build_id#7,is_gtr_mr#8,is_avs_mr#9,gtr_mr_id#10,avs_mr_id#11,submit_time#12,last_modified_time#13,baseline_job_id#14,n_tasks#15L,n_tasks_success#16L,scenario_hash#17,scenario_hash_simple#18,n_scenarios#19L,classification_precision#20,classification_recall#21,classification_f1#22,classification_wf1#23,... 572 more fields] Batched: false, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[s3://tmp-bucket-delete-drew-datahub/final_attempt], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<job_type:string,job_source:string,avs_version:string,gtr_version:string,jenkins_gtr_build_... ,org.apache.spark.sql.execution.SparkPlanInfo@b807ee,1700157257688) by listener DatahubSparkListener took 1.828024576s. 23/11/16 17:54:20 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on [2600:1f18:148c:9803:56f0:3cc8:7c93:8cfc]:40779 (size: 48.8 KiB, free: 7.9 GiB) 23/11/16 17:54:23 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 4198 ms on [2600:1f18:148c:9803:56f0:3cc8:7c93:8cfc] (executor 1) (1/1) 23/11/16 17:54:23 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 23/11/16 17:54:23 INFO DAGScheduler: ResultStage 0 (showString at NativeMethodAccessorImpl.java:0) finished in 4.550 s 23/11/16 17:54:23 INFO ExecutorContainerAllocator: Set total expected execs to {0=0} 23/11/16 17:54:23 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job 23/11/16 17:54:23 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished 23/11/16 17:54:23 INFO DAGScheduler: Job 0 finished: showString at NativeMethodAccessorImpl.java:0, took 4.647069 s 23/11/16 17:54:24 INFO BlockManagerInfo: Removed broadcast_1_piece0 on [2600:1f18:148c:9803:d18e:5421:a449:d0f4]:38709 in memory (size: 43.0 KiB, free: 7.3 GiB) 23/11/16 17:54:24 INFO BlockManagerInfo: Removed broadcast_1_piece0 on [2600:1f18:148c:9803:56f0:3cc8:7c93:8cfc]:40779 in memory (size: 43.0 KiB, free: 7.9 GiB) 23/11/16 17:54:24 INFO CodeGenerator: Code generated in 983.728621 ms 23/11/16 17:54:24 INFO FileSourceStrategy: Pushed Filters: 23/11/16 17:54:24 INFO FileSourceStrategy: Post-Scan Filters: 23/11/16 17:54:24 INFO FileSourceStrategy: Output Data Schema: struct<job_type: string, job_source: string, avs_version: string, gtr_version: string, jenkins_gtr_build_url: string ... 594 more fields> 23/11/16 17:54:25 INFO ParquetFileFormat: Using user defined output committer for Parquet: com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter 23/11/16 17:54:25 INFO SQLConfCommitterProvider: Getting user defined output committer class com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter 23/11/16 17:54:25 INFO EmrOptimizedParquetOutputCommitter: EMR Optimized Committer: ENABLED 23/11/16 17:54:25 INFO EmrOptimizedParquetOutputCommitter: Using output committer class org.apache.hadoop.mapreduce.lib.output.FileSystemOptimizedCommitter 23/11/16 17:54:25 INFO FileOutputCommitter: File Output Committer Algorithm version is 2 23/11/16 17:54:25 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: true 23/11/16 17:54:25 INFO FileOutputCommitter: File Output Committer Algorithm version is 2 23/11/16 17:54:25 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: true 23/11/16 17:54:25 INFO SQLConfCommitterProvider: Using output committer class com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter 23/11/16 17:54:25 INFO FileSystemOptimizedCommitter: Nothing to setup as successful task attempt outputs are written directly 23/11/16 17:54:25 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 579.6 KiB, free 7.3 GiB) 23/11/16 17:54:25 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 48.8 KiB, free 7.3 GiB) 23/11/16 17:54:25 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on [2600:1f18:148c:9803:d18e:5421:a449:d0f4]:38709 (size: 48.8 KiB, free: 7.3 GiB) 23/11/16 17:54:25 INFO SparkContext: Created broadcast 2 from saveAsTable at NativeMethodAccessorImpl.java:0 23/11/16 17:54:25 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes, number of split files: 1, prefetch: true 23/11/16 17:54:25 INFO FileSourceScanExec: relation: Some(`tmp_drew`.`final_attempt`), fileSplitsInPartitionHistogram: Vector((1 fileSplits,1)) 23/11/16 17:54:25 INFO SparkContext: Starting job: saveAsTable at NativeMethodAccessorImpl.java:0 23/11/16 17:54:25 INFO DAGScheduler: Got job 1 (saveAsTable at NativeMethodAccessorImpl.java:0) with 1 output partitions 23/11/16 17:54:25 INFO DAGScheduler: Final stage: ResultStage 1 (saveAsTable at NativeMethodAccessorImpl.java:0) 23/11/16 17:54:25 INFO DAGScheduler: Parents of final stage: List() 23/11/16 17:54:25 INFO DAGScheduler: Missing parents: List() 23/11/16 17:54:25 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[5] at saveAsTable at NativeMethodAccessorImpl.java:0), which has no missing parents 23/11/16 17:54:25 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 348.4 KiB, free 7.3 GiB) 23/11/16 17:54:25 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 105.9 KiB, free 7.3 GiB) 23/11/16 17:54:25 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on [2600:1f18:148c:9803:d18e:5421:a449:d0f4]:38709 (size: 105.9 KiB, free: 7.3 GiB) 23/11/16 17:54:25 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1518 23/11/16 17:54:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[5] at saveAsTable at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0)) 23/11/16 17:54:25 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks resource profile 0 23/11/16 17:54:25 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1) ([2600:1f18:148c:9803:fae6:705:86f8:a499], executor 3, partition 0, ANY, 5086 bytes) taskResourceAssignments Map() 23/11/16 17:54:25 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on [2600:1f18:148c:9803:fae6:705:86f8:a499]:40915 (size: 105.9 KiB, free: 7.9 GiB) 23/11/16 17:54:26 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on [2600:1f18:148c:9803:fae6:705:86f8:a499]:40915 (size: 48.8 KiB, free: 7.9 GiB) 23/11/16 17:54:29 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 4253 ms on [2600:1f18:148c:9803:fae6:705:86f8:a499] (executor 3) (1/1) 23/11/16 17:54:29 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 23/11/16 17:54:29 INFO DAGScheduler: ResultStage 1 (saveAsTable at NativeMethodAccessorImpl.java:0) finished in 4.297 s 23/11/16 17:54:29 INFO DAGScheduler: Job 1 is finished. Cancelling potential speculative or zombie tasks for this job 23/11/16 17:54:29 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage finished 23/11/16 17:54:29 INFO DAGScheduler: Job 1 finished: saveAsTable at NativeMethodAccessorImpl.java:0, took 4.303416 s 23/11/16 17:54:29 INFO FileFormatWriter: Start to commit write Job 451a4592-5d45-4dd9-88f5-39b4f75e198f. 23/11/16 17:54:29 INFO MultipartUploadOutputStream: close closed:false s3://tmp-bucket-delete-drew-datahub/derivative_table_2/_SUCCESS 23/11/16 17:54:30 INFO FileFormatWriter: Write Job 451a4592-5d45-4dd9-88f5-39b4f75e198f committed. Elapsed time: 204 ms. 23/11/16 17:54:30 INFO FileFormatWriter: Finished processing stats for write job 451a4592-5d45-4dd9-88f5-39b4f75e198f. 23/11/16 17:54:30 INFO InMemoryFileIndex: It took 30 ms to list leaf files for 1 paths. 23/11/16 17:54:30 INFO HiveExternalCatalog: Persisting file based data source table `tmp_drew`.`derivative_table_2` into Hive metastore in Hive compatible format. 23/11/16 17:54:30 INFO SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=a4821e72-638f-4c24-b3b2-fa03d1748c18, clientType=HIVECLI] 23/11/16 17:54:30 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. 23/11/16 17:54:30 INFO AWSCatalogMetastoreClient: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook 23/11/16 17:54:30 INFO AWSGlueClientFactory: Setting region to : us-east-1 23/11/16 17:54:31 INFO AWSGlueClientFactory: Setting region to : us-east-1 23/11/16 17:54:31 INFO AbstractConnector: Stopped Spark@3de8376e{HTTP/1.1, (http/1.1)}{0.0.0.0:4040} 23/11/16 17:54:31 INFO SparkUI: Stopped Spark web UI at http://[2600:1f18:148c:9803:d18e:5421:a449:d0f4]:4040 23/11/16 17:54:31 INFO EmrServerlessClusterSchedulerBackend: Shutting down all executors 23/11/16 17:54:31 INFO EmrServerlessClusterSchedulerBackend$EmrServerlessDriverEndpoint: Asking each executor to shut down 23/11/16 17:54:31 INFO TimeBasedRotatingEventLogFilesWriter: Renaming file:/var/log/spark/apps/eventlog_v2_00fepu9m0fhn7o0a/00fepu9m0fhn7o0a.inprogress to file:/var/log/spark/apps/eventlog_v2_00fepu9m0fhn7o0a/events_1_00fepu9m0fhn7o0a 23/11/16 17:54:31 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 23/11/16 17:54:31 INFO MemoryStore: MemoryStore cleared 23/11/16 17:54:31 INFO BlockManager: BlockManager stopped 23/11/16 17:54:31 INFO BlockManagerMaster: BlockManagerMaster stopped 23/11/16 17:54:31 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 23/11/16 17:54:31 INFO SparkContext: Successfully stopped SparkContext 23/11/16 17:54:32 INFO ShutdownHookManager: Shutdown hook called 23/11/16 17:54:32 INFO ShutdownHookManager: Deleting directory /tmp/spark-747f5431-43e4-4443-a859-8f63df1d40ed 23/11/16 17:54:32 INFO ShutdownHookManager: Deleting directory /tmp/spark-869daa7a-25c1-4e02-afeb-38b77bd824db 23/11/16 17:54:32 INFO ShutdownHookManager: Deleting directory /tmp/spark-869daa7a-25c1-4e02-afeb-38b77bd824db/pyspark-f5e29928-7c24-43e4-8fae-cc985d82965f