CDH 5 Release Notes
CDH 5 Release Notes
The following lists all Apache Hive Jiras included in CDH 5
that are not included in the Apache Hive base version 0.11.0. The
file lists all changes included in CDH 5. The patch for each
change can be found in the cloudera/patches directory in the release tarball.
Changes Not In Apache Hive 0.11.0
[] - Add --version option to hive script
[] - Column statistics on a invalid column name results in IndexOutOfBoundsException
[] - NoClassDefFoundError is thrown when using lead/lag with kryo serialization
[] - Non-default OI constructors should be supported if for backwards compatibility
[] - RetryingRawStore should not retry on logical failures (e.g. from commit)
[] - Add a schema tool for offline metastore schema upgrade
[] - TestWebHCatE2e is failing intermittently on trunk
[] - Current database in metastore.Hive is not consistent with SessionState
[] - Default log4j log level for WebHCat should be INFO not DEBUG
[] - Direct SQL fallback broken on Postgres
[] - Column statistics on a partitioned column should fail early with proper error message
[] - Missing metastore schema files for version 0.11
[] - Hive 0.11.0 is not working with pre-cdh3u6 and hadoop-0.23
[] - MapJoinProcessor ignores order of values in removing RS
[] - Add kryo into eclipse template
[] - Missing test file for HIVE-5199
[] - ant maven-build fails because hcatalog doesn't have a make-pom target
[] - NullPointerException in exec.Utilities
[] - Custom SerDe containing a nonSettable complex data type row object inspector throws cast exception with HIVE 0.11
[] - Apache builds fail with Target "make-pom" does not exist in the project "hcatalog".
[] - LazyDate goes into irretrievable NULL mode once inited with NULL once
[] - Revert HIVE-4322
[] - Address thread safety issues with HiveHistoryUtil
[] - WebHCatJTShim implementations are missing Apache license headers
[] - Change HCatalog spacing from 4 spaces to 2
[] - JDBC driver assumes executeStatement is synchronous
[] - move hbase storage handler to org.apache.hcatalog package
[] - remove hcatalog/shims directory
[] - create binary backwards compatibility layer hcatalog 0.12 and 0.11
[] - JDBC client's hive variables are not passed to HS2
[] - FunctionRegistry.getMethodInternal() should prefer method arguments with closer affinity to the original argument types
[] - Move all HCatalog classes to org.apache.hive.hcatalog
[] - Fix TempletonUtilsTest failure on Windows
[] - [HCatalog] Fix HCatalog unit tests on Windows
[] - [HCatalog] WebHCat does not honor user home directory
[] - [HCatalog] WebHCat should not override user.name parameter for Queue call
[] - Multiple table insert fails on count(distinct)
[] - ReduceSinkDeDuplication can pick the wrong partitioning columns
[] - HCatStorer fails to store boolean type
[] - [HCatalog] Fix HCatalog build issue on Windows
[] - Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23
[] - Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23
[] - Create an ORC test case that has a 0.11 ORC file
[] - A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask
[] - refactor org.apache.hadoop.mapred.HCatMapRedUtil
[] - FetchOperator fails on partitioned Avro data
[] - Publish HCatalog artifacts for Hadoop 2.x
[] - ORC files should have an option to pad stripes to the HDFS block boundaries
[] - Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced
[] - Direct SQL for view is failing
[] - Some limit can be pushed down to map stage
[] - HashTableSink allocates empty new Object[] arrays & OOMs - use a static emptyRow instead
[] - Single sourced multi insert consists of native and non-native table mixed throws NPE
RCFile::sync(long)
missing 1 byte in System.arraycopy()
[] - OVER accepts general expression instead of just function
[] - exported metadata by HIVE-3068 cannot be imported because of wrong file name
[] - document what hive.server2.thrift.sasl.qop values mean in hive-default.xml.template
[] - HCatFieldSchema overrides equals() but not hashCode()
[] - HIVE_AUX_JARS_PATH should have : instead of , as separator since it gets appended to HADOOP_CLASSPATH
[] - bug in ExprProcFactory.genPruner
[] - Hive Metatool errors out if HIVE_OPTS is set
[] - HCatSchema.remove(HCatFieldSchema hcatFieldSchema) does not clean up fieldPositionMap
[] - udaf_percentile_approx.q is not deterministic
[] - Refactor MapJoin HashMap code to improve testability and readability
[] - StorageBasedAuthorization provider causes an NPE when asked to authorize from client side.
[] - Tests on list bucketing are failing again in hadoop2
[] - SMB joins fail based on bigtable selection policy.
[] - Remove unwanted file from the trunk.
[] - SessionState temp file gets created in history file directory
[] - [HCatalog] Create hcat.py, hcat_server.py to make HCatalog work on Windows
[] - Row sampling throws NPE when used in sub-query
[] - cast ( &string type& as bigint) returning null values
[] - Hive client filters partitions incorrectly via pushdown in certain cases involving "or"
[] - Fix some non-deterministic or not-updated tests
[] - Hive get wrong result when partition has the same path but different schema or authority
[] - StorageBasedAuthorizationProvider masks lower level exception with IllegalStateException
[] - disable hivehistory logs by default
[] - Fix parallel order by on hadoop2
[] - Hive returns non-meanful error message for ill-formed fs.default.name
[] - ORC Turn off dictionary encoding when number of distinct keys is greater than threshold
[] - Hcatalog's bin/hcat script doesn't respect HIVE_HOME
[] - Fix a concurrency bug in LazyBinaryUtils due to a static field
[] - timestamp - timestamp causes null pointer exception
[] - DBTokenStore gives compiler warnings
[] - ORC seeks fails with non-zero offset or column projection
[] - Dynamic partitioning in HCatalog broken on external tables
[] - HIVE-3926 is committed in the state of not rebased to trunk
[] - PPD on virtual column of partitioned table is not working
[] - The TGT gotten from class 'CLIService'
should be renewed on time
[] - Classes of metastore should not be included MR-task
[] - Hive's metastore suffers from 1+N queries when querying partitions & is slow
[] - Explain Extended to show partition info for Fetch Task
[] - select * may incorrectly return empty fields with hbase-handler
[] - hive build with 0.20 is broken
[] - Support alternate table types for HiveServer2
[] - fix coverage org.apache.hadoop.hive.cli
[] - BinaryConverter does not respect nulls
[] - HS2 with kerberos- local task for map join fails
[] - unit tests fail on windows because of difference in input file size
[] - combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels)
[] - Fix the compiling error in TestHadoop20SAuthBridge
[] - Thread local PerfLog can get shared by multiple hiveserver2 sessions
[] - When deduplicating multiple SelectOperators, we should update RowResolver accordinly
[] - When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results
[] - update code generated by thrift for DemuxOperator and MuxOperator
[] - Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability
[] - Enhance listPartitionsByFilter to add support for integral types both for equality and non-equality
[] - Support timestamps earlier than 1970 and later than 2038
[] - Date literals do not work properly in partition spec clause
[] - Add support for binary dataype to AvroSerde
[] - Update asm version in Hive
[] - serde_user_properties.q.out needs to be updated
[] - the type of all numeric constants is changed to double in the plan
[] - Desc table can't show non-ascii comments
[] - WebHCat job submission is killed by TaskTracker since it's not sending a heartbeat properly
[] - PTest2 cleanup after merge
[] - Fix eclipse template files to use correct datanucleus libs
[] - When we merge two MapJoin MapRedTasks, the TableScanOperator of the second one should be removed
[] - With Dynamic partitioning, some queries would scan default partition even if query is not using it.
[] - Upgrade datanucleus to support JDK7
[] - Potential NPE in MetadataOnlyOptimizer
[] - "LOAD DATA" does not honor permission inheritence
[] - PTFOperator fails resetting PTFPersistence
[] - Timestamp type constants cannot be deserialized in JDK 1.6 or less
[] - make checkstyle ignore IntelliJ files and templeton e2e files
[] - Fix the mismatched column names in package.jdo
[] - TestHadoop20SAuthBridge tests fail sometimes because of race condition
[] - Beeling help text do not contain -f and -e parameters
[] - A complex create view statement fails with new Antlr 3.4
[] - [HCatalog] WebHCat e2e tests fail on Hadoop 2
[] - hive config template is not parse-able due to angle brackets in description
[] - NPE - subquery smb joins fails
[] - ORC readers should have a better error detection for non-ORC files
[] - Join on more than 2^31 records on single reducer failed (wrong results)
[] - SequenceId in operator is not thread safe
[] - HiveLockObjects: Unlocking retries/times out when query contains ":"
[] - webhcat_config.sh should set default values for HIVE_HOME and HCAT_PREFIX that work with default build tree structure
[] - Correctness issue with MapJoins using the null safe operator
[] - -Dbuild.profile=core fails
[] - testCliDriver_load_hdfs_file_with_space_in_the_name fails on hadoop 2
[] - junit timeout needs to be updated
[] - Fix TestCliDriver.ptf_npath.q on 0.23
[] - Build profiles: Partial builds for quicker dev
[] - Fix eclipse template classpath to include the BoneCP lib
[] - TestDynamicSerDe failed with IBM JDK
[] - Hive metastore hangs
[] - Fix eclipse template classpath to include the correct jdo lib
[] - Test clientnegative/nested_complex_neg.q got broken due to 4580
[] - Refactor exec package
[] - TestWebHCatE2e checkstyle violation causes all tests to fail
[] - Change DDLTask to report errors using canonical error messages rather than http status codes
[] - Logical explain plan
[] - HiveHistory.log need to replace '\r' with space before writing Entry.value to historyfile
[] - Fix url check for missing "/" or "/&db& after hostname in jdb uri
[] - HiveLockObjectData is not compared properly
[] - INLINE UDTF doesn't convert types properly
[] - Indices can't be built on tables whose schema info comes from SerDe
[] - (Slightly) break up the SemanticAnalyzer monstrosity
[] - Adjust WebHCat e2e tests until HIVE-4703 is addressed
[] - javax.jdo : jdo2-api dependency not in Maven Central
[] - parallel order by fails for small datasets
[] - For outerjoins, joinEmitInterval might make wrong result
[] - ArrayIndexOutOfBounds exception for deeply nested structs
[] - hive.exec.parallel=true doesn't work on hadoop-2
[] - Missing "/" or "/&dbname&" in hs2 jdbc uri switches mode to embedded mode
[] - Semantic analysis fails in presence of certain literals in on clause
[] - LazyTimestamp goes into irretrievable NULL mode once inited with NULL once
[] - Implement isCaseSensitive for Hive JDBC driver
[] - LEFT SEMI JOIN generates wrong results when the number of rows belonging to a single key of the right table exceed hive.join.emit.interval
[] - show create table creating unusable DDL when field delimiter is \001
[] - hcatalog/webhcat scripts in tar.gz don't have execute permissions set
[] - NPE when we call isSame from an instance of ExprNodeConstantDesc with null value
[] - Constant agg parameters will be replaced by ExprNodeColumnDesc with single-sourced multi-gby cases
[] - HIVE-2379 is missing hbase.jar itself
[] - Upgrade Hadoop 0.23 profile to 2.0.5-alpha
[] - Making changes to webhcat-site.xml have no effect
[] - ant testreport doesn't include any HCatalog tests
[] - JDBC2 won't compile with JDK7
[] - ObjectStore.getPMF has concurrency problems
[] - ZooKeeperHiveLockManage.unlockPrimitive has race condition with threads
[] - Reading of partitioned Avro data fails because of missing properties
[] - WebHCat can deadlock Hadoop if the number of concurrently running tasks if higher or equal than the number of mappers
[] - Fix TestCliDriver.list_bucket_dml_{2,4,5,9,12,13}.q on 0.23
[] - Support configurable domain name for HiveServer2 LDAP authentication using Active Directory
[] - HCatalog HBaseHCatStorageHandler is not returning all the data
[] - ErrorMsg has several messages that reuse the same error code
[] - Fix TestCliDriver.list_bucket_query_oneskew_{1,2,3}.q on 0.23
[] - bine2.q on 0.23
[] - Fix TestCliDriver.skewjoin_union_remove_{1,2}.q on 0.23
[] - Fix TestCliDriver.{recursive_dir.q,sample_islocalmode_hook.q,input12.q,input39.q,auto_join14.q} on 0.23
[] - Fix non-deterministic TestCliDriver on 0.23
[] - Fix TestCliDriver.truncate_* on 0.23
[] - orc_createas1.q has minor inconsistency
[] - Column stats: Distinct value estimator should use hash functions that are pairwise independent
[] - Test output need to be updated for Windows only unit test in TestCliDriver
[] - HCatalog checkstyle violation after HIVE-2670
[] - Unit test compile fail at hbase-handler project on Windows becuase of illegal escape character
[] - Failed to create a table from existing file if file path has spaces
[] - Fix concurrency bug in serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java
[] - NullPointerException if typeinfo and nativesql commands are executed at beeline before a DB connection is established
[] - skewjoin.q is failing in hadoop2
[] - Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340)
[] - Hive CLI leaves behind the per session resource directory on non-interactive invocation
[] - Remove unused MR Temp file localization from Tasks
[] - TestNegativeCliDriver failure message if cmd succeeds is misleading
[] - Invalid column names allowed when created dynamically by a SerDe
[] - alter view rename NPEs with authorization on.
[] - Running Hive queries on Yarn (MR2) gives warnings related to overriding final parameters
[] - join_vc.q is not deterministic
[] - HIVE-3393 brought in Jackson library,and these four jars should be packed into hive-exec.jar
[] - beeline always return the same error message twice
[] - HS2 doesn't nest exceptions properly (fun debug times)
[] - hive build fails with hadoop 0.20
[] - Broken link in HCat 0.5 doc (Reader and Writer Interfaces)
[] - ColumnPruner cannot preserve RS key columns corresponding to un-selected join keys in columnExprMap
[] - JOIN-GRP BY-DISTINCT fails with NPE when mapjoin.mapreduce=true
[] - HCat e2e tests broken by changes to Hive's describe table formatting
[] - Changes to Pig's test harness broke HCat e2e tests
[] - Switch RCFile default to LazyBinaryColumnarSerDe
[] - build fails after branch (hcatalog version not updated)
[] - Auto join conversion fails in certain cases (empty tables, empty partitions, no partitions)
[] - FetchOperator slows down SMB map joins by 50% when there are many partitions
[] - local_mapred_error_cache fails on some hadoop versions
[] - SMB Operator spills to disk like it's 1999
[] - Fix continue.on.failure in unit tests to -well- continue on failure in unit tests
[] - Build fails with hcatalog checkstyle error
[] - Illogical InvalidObjectException throwed when use mulit aggregate functions with star columns
[] - Add pseudo-BNF grammar for RCFile to Javadoc
[] - beeline module tests don't get run by default
[] - Column access not tracked properly for partitioned tables
[] - webhcat e2e tests succeed regardless of exitvalue
[] - support AS keyword for table alias
[] - Remove unused join configuration parameter: hive.mapjoin.cache.numrows
[] - Remove unused join configuration parameter: hive.mapjoin.size.key
[] - MetaStoreUtils.java.orig checked in mistakenly by HIVE-4409
[] - Star argument without table alias for UDTF is not working
[] - Many new failures on hadoop 2
[] - ant thriftif
generated code that is checkedin is not up-to-date
[] - physical optimizer changes for auto sort-merge join
[] - Hive/HBase integration could be improved
[] - Lateral view makes invalid result if CP is disabled
[] - beeline always exits with 0 status, should exit with non-zero status on error
[] - Hcatalog build fail on Windows because javadoc command exceed length limit
[] - ant thriftif
[] - SkewedInfo in Metastore Thrift API cannot be deserialized in Python
Improvement
[] - HIVE-3978 broke the command line option --auxpath
[] - Support metastore version consistency check
[] - Asynchronous execution in HiveServer2 to run a query in non-blocking mode
[] - partition name filtering uses suboptimal datastructures
[] - Hive plan serialization is slow
[] - log more stuff via PerfLogger
[] - Change type compatibility methods to use PrimitiveCategory rather than TypeInfo
[] - allow getting all partitions for table to also use direct SQL path
[] - Fetch task aggregation for simple group by query
[] - WebHCat needs to support proxy users
[] - Remove obsolete code on SemanticAnalyzer#genJoinTree
[] - ExprNodeColumnDesc doesn't distinguish partition and virtual columns, causing partition pruner to receive the latter
[] - Insert + orderby + limit does not need additional RS for limiting rows
[] - refactor/clean up partition name pruning to be usable inside metastore server
[] - Compiler should captures UDF as part of read entities
[] - Create a SARG interface for RecordReaders
[] - Put deterministic ordering in the top-K ngrams output of UDF context_ngrams()
[] - Re-factor HiveServer2 JDBC PreparedStatement to avoid duplicate code
[] - add ability to skip javadoc during build
[] - Don't serialize unnecessary fields in query plan
[] - WriteLockTest and ZNodeNameTest do not follow test naming pattern
[] - lastAlias in CommonJoinOperator is not used
[] - Merge a Map-only task to its child task
[] - PTFTranslator hardcodes ranking functions
[] - PTest2 handle Spot Price increases gracefully and improve rsync paralllelsim
[] - Separate MapredWork into MapWork and ReduceWork
[] - Add support for pulling HBase columns with prefixes
[] - Sort "show grant" result to improve usability and testability
[] - In ExecReducer, remove tag from the row which will be passed to the first Operator at the Reduce-side
[] - Identical methods PTFDeserializer.addOIPropertiestoSerDePropsMap(), PTFTranslator.addOIPropertiestoSerDePropsMap()
[] - Create new parallel unit test environment
[] - Sort candidate functions in case of UDFArgumentException
[] - Enable client-side caching for scans on HBase
[] - Make KW_OUTER optional in outer joins
[] - RetryingHMSHandler logs too many error messages
[] - JDBC2 does not support VOID type
[] - Allow hive tests to specify an alternative to /tmp
[] - Simple reconnection support for jdbc2
[] - JDBC compliance change TABLE_SCHEMA to TABLE_SCHEM
[] - Script hcat is overriding HIVE_CONF_DIR variable
[] - Support PreparedStatement.setObject
[] - MR temp directory conflicts in case of parallel execution mode
[] - HCatalog checkstyle violation after HIVE-4578
[] - Enforce minmum ant version required in build script
[] - Cache evaluation result of deterministic expression and reuse it
[] - Improve RCFile::sync(long) 10x
[] - Size of aggregation buffer which uses non-primitive type is not estimated correctly
[] - Prevent incompatible column type changes
[] - Make the deleteData flag accessable from DropTable/Partition events
[] - optimize count(distinct) with hive.map.groupby.sorted
[] - Beeline should support the -f option
[] - Single sourced multi query cannot handle lateral view
[] - optimize hive.enforce.sorting and hive.enforce bucketing join
New Feature
[] - Support in memory PTF partitions
[] - Implement predicate pushdown for ORC
[] - Convenience UDFs for binary data type
[] - The RLE encoding for ORC can be improved
[] - Enable QOP configuration for Hive Server 2 thrift transport
[] - Port Hadoop streaming's counters/status reporters to Hive Transforms
[] - Add DBTokenStore to store Delegation Tokens in DB
[] - Build An Analytical SQL Engine for MapReduce
[] - add a new optimizer for query correlation discovery and optimization
[] - Support group by on struct type
[] - In ORC, add boolean noNulls flag to column stripe metadata
[] - Add parallel ORDER BY to Hive
[] - A cluster test utility for Hive
[] - Add exchange partition in Hive
[] - Column truncation
[] - Modify Hive build to enable compiling and running Hive with JDK7
[] - Move VerifyingObjectStore into ql package
[] - HIVE-2608 didn't removed udtf_not_supported2.q test
[] - Make Hive compile and run with JDK7
[] - Meaningless warning message from TypeCheckProcFactory
[] - beeline code should have apache license headers
[] - Comments in CommonJoinOperator for aliasTag is not valid
[] - HCat needs to get current Hive jars instead of pulling them from maven repo
[] - Missing file on HIVE-4068
[] - Add q file tests for ORC predicate pushdown
[] - direct SQL perf optimization cannot be tested well
[] - Newly added test TestSessionHooks is failing on trunk
[] - Enhance coverage of package org.apache.hadoop.hive.ql.udf
[] - orc_dictionary_threshold is not deterministic
[] - Stat information like numFiles and totalSize is not correct when sub-directory is exists
[] - Test result of ppd_vc.q is not udpated
[] - Improve test coverage of package org.apache.hadoop.hive.ql.optimizer.pcr
[] - Increase coverage of package org.apache.mon.metrics
[] - Enhance coverage of package org.apache.hadoop.hive.ql.exec.errors
[] - improve test coverage of package org.apache.hadoop.hive.ql.udf.xml
[] - Improve test coverage of package org.apache.hadoop.hive.ql.io
[] - auto_sortmerge_join_9.q throws NPE but test is succeeded
[] - Failing on TestSemanticAnalysis.testAddReplaceCols in trunk
[] - HBase e2e tests on single nodes on Hadoop 2.0.3 with "dfs.client.read.shortcircuit" turning on for HBase
[] - NotificationListener is not thread safe