Apache Spark 1.6.2 发布，集群计算环境

发布于 2016-06-28 04:51:58 | 257 次阅读 | 评论: 0 | 来源: 网友投递

Apache Spark

Spark是UC Berkeley AMP lab所开源的类Hadoop MapReduce的通用的并行，Spark，拥有Hadoop MapReduce所具有的优点；但不同于MapReduce的是Job中间输出结果可以保存在内存中，从而不再需要读写HDFS，因此Spark能更好地适用于数据挖掘与机器学习等需要迭代的map reduce的算法。

Apache Spark 1.6.2 发布了，

改进日志如下：

Sub-task

[SPARK-15613] - Incorrect days to millis conversion
[SPARK-15723] - SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name

Bug

[SPARK-8428] - TimSort Comparison method violates its general contract with CLUSTER BY
[SPARK-10722] - Uncaught exception: RDDBlockId not found in driver-heartbeater
[SPARK-11327] - spark-dispatcher doesn't pass along some spark properties
[SPARK-11507] - Error thrown when using BlockMatrix.add
[SPARK-12655] - GraphX does not unpersist RDDs
[SPARK-12712] - test-dependencies.sh script fails when run against empty .m2 cache
[SPARK-12941] - Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype
[SPARK-13023] - Check for presence of 'root' module after computing test_modules, not changed_modules
[SPARK-13207] - _SUCCESS should not break partition discovery
[SPARK-13227] - Risky apply() in OpenHashMap
[SPARK-13242] - Moderately complex `when` expression causes code generation failure
[SPARK-13327] - colnames()<- allows invalid column names
[SPARK-13352] - BlockFetch does not scale well on large block
[SPARK-13444] - QuantileDiscretizer chooses bad splits on large DataFrames
[SPARK-13522] - Executor should kill itself when it's unable to heartbeat to the driver more than N times
[SPARK-13566] - Deadlock between MemoryStore and BlockManager
[SPARK-13622] - Issue creating level db file for YARN shuffle service if URI is used in yarn.nodemanager.local-dirs
[SPARK-13631] - getPreferredLocations race condition in spark 1.6.0?
[SPARK-13642] - Properly handle signal kill of ApplicationMaster
[SPARK-13648] - org.apache.spark.sql.hive.client.VersionsSuite fails NoClassDefFoundError on IBM JDK
[SPARK-13652] - TransportClient.sendRpcSync returns wrong results
[SPARK-13697] - TransformFunctionSerializer.loads doesn't restore the function's module name if it's '__main__'
[SPARK-13705] - UpdateStateByKey Operation documentation incorrectly refers to StatefulNetworkWordCount
[SPARK-13711] - Apache Spark driver stopping JVM when master not available
[SPARK-13755] - Escape quotes in SQL plan visualization node labels
[SPARK-13772] - DataType mismatch about decimal
[SPARK-13803] - Standalone master does not balance cluster-mode drivers across workers
[SPARK-13806] - SQL round() produces incorrect results for negative values
[SPARK-13845] - BlockStatus and StreamBlockId keep on growing result driver OOM
[SPARK-13850] - TimSort Comparison method violates its general contract
[SPARK-13901] - We get wrong logdebug information when jump to the next locality level.
[SPARK-13958] - Executor OOM due to unbounded growth of pointer array in Sorter
[SPARK-14006] - Builds of 1.6 branch fail R check
[SPARK-14074] - Use fixed version of install_github in SparkR build
[SPARK-14159] - StringIndexerModel sets output column metadata incorrectly
[SPARK-14187] - Incorrect use of binarysearch in SparseMatrix
[SPARK-14204] - [SQL] Failure to register URL-derived JDBC driver on executors in cluster mode
[SPARK-14219] - Fix `pickRandomVertex` not to fall into infinite loops for graphs with one vertex
[SPARK-14232] - Event timeline on job page doesn't show if an executor is removed with multiple line reason
[SPARK-14243] - updatedBlockStatuses does not update correctly when removing blocks
[SPARK-14261] - Memory leak in Spark Thrift Server
[SPARK-14298] - LDA should support disable checkpoint
[SPARK-14322] - Use treeAggregate instead of reduce in OnlineLDAOptimizer
[SPARK-14357] - Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job failure
[SPARK-14363] - Executor OOM due to a memory leak in Sorter
[SPARK-14368] - Support python.spark.worker.memory with upper-case unit
[SPARK-14454] - Better exception handling while marking tasks as failed
[SPARK-14468] - Always enable OutputCommitCoordinator
[SPARK-14495] - Distinct aggregation cannot be used in the having clause
[SPARK-14563] - SQLTransformer.transformSchema is not implemented correctly
[SPARK-14665] - PySpark StopWordsRemover default stopwords are Java object
[SPARK-14671] - Pipeline.setStages needs to handle Array non-covariance
[SPARK-14679] - UI DAG visualization causes OOM generating data
[SPARK-14739] - Vectors.parse doesn't handle dense vectors of size 0 and sparse vectors with no indices
[SPARK-14757] - Incorrect behavior of Join operation in Spqrk SQL JOIN : "false" in the left table is joined to "null" on the right table
[SPARK-14915] - Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job to never complete
[SPARK-14965] - StructType throws exception for missing field
[SPARK-15165] - Codegen can break because toCommentSafeString is not actually safe
[SPARK-15209] - Web UI's timeline visualizations fails to render if descriptions contain single quotes
[SPARK-15260] - UnifiedMemoryManager could be in bad state if any exception happen while evicting blocks
[SPARK-15262] - race condition in killing an executor and reregistering an executor
[SPARK-15528] - conv function returns inconsistent result for the same data
[SPARK-15601] - CircularBuffer's toString() to print only the contents written if buffer isn't full
[SPARK-15736] - Gracefully handle loss of DiskStore files
[SPARK-15754] - org.apache.spark.deploy.yarn.Client changes the credential of current user
[SPARK-15892] - Incorrectly merged AFTAggregator with zero total count
[SPARK-15975] - Improper Popen.wait() return code handling in dev/run-tests
[SPARK-16017] - YarnClientSchedulerBackend now registers backends as IPs instead of Hostnames which causes all tasks to run with RACK_LOCAL locality.
[SPARK-16035] - The SparseVector parser fails checking for valid end parenthesis
[SPARK-16086] - Python UDF failed when there is no arguments
[SPARK-16173] - Can't join describe() of DataFrame in Scala 2.10

Documentation

[SPARK-14618] - RegressionEvaluator doc out of date
[SPARK-15223] - spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

Improvement

[SPARK-13599] - Groovy-all ends up in spark-assembly if hive profile set
[SPARK-13601] - Invoke task failure callbacks before calling outputstream.close()
[SPARK-13663] - Upgrade Snappy Java to 1.1.2.1
[SPARK-13810] - Add Port Configuration Suggestions on Bind Exceptions
[SPARK-14058] - Incorrect docstring in Window.orderBy
[SPARK-14107] - PySpark spark.ml GBT algs need seed Param
[SPARK-14149] - Log exceptions in tryOrIOException
[SPARK-14242] - avoid too many copies in network when a network frame is large
[SPARK-14787] - Upgrade Joda-Time library from 2.9 to 2.9.3
[SPARK-15205] - Codegen can compile the same source code more than twice
[SPARK-15827] - Publish Spark's forked sbt-pom-reader to Maven Central

New Feature

[SPARK-11515] - QuantileDiscretizer should take random seed
[SPARK-13465] - Add a task failure listener to TaskContext

历史版本 :
Apache Spark 2.2.0 正式发布，提高可用性和稳定性
Spark 2.0 时代全面到来 —— 2.0.1 版本发布
Apache Spark 2.0.0 发布，APIs 更新
Apache Spark 1.6.2 发布，集群计算环境
Spark 2.0 预览：更简单，更快，更智能
Spark 2.7.6 发布，开源集群计算环境
Apache spark 1.6.1 发布，集群计算环境
Apache Spark 2.0 最快今年4月亮相
Apache Spark 1.6 正式发布，性能大幅度提升
Apache Spark 1.6 预览版：更简便的搜索
Apache Spark 1.5.2 发布，开源集群计算环境
Apache Spark 1.5.1 发布，开源集群计算环境

Apache Spark 1.6.2 发布，集群计算环境

Apache Spark

Sub-task

Bug

Documentation

Improvement

New Feature

后端技术

前端技术

数据库

热门框架

常用IDE

其他