SparkAPI Java版】JavaPairRDD——countByValue、countByValueApprox

网友投稿 741 2022-05-29

【SparkAPI JAVA版】JavaPairRDD——countByValue、countByValueApprox

/** * Return the count of each unique value in this RDD as a map of (value, count) pairs. The final * combine step happens locally on the master, equivalent to running a single reduce task. */

返回RDD中每个值的计数,作为(value,count)对的映射。

返回的是map

// java public static java.util.Map countByValue() // scala def countByValue(): Map[(K, V), Long]

public class CountByValue { public static void main(String[] args) { System.setProperty("hadoop.home.dir", "E:\\hadoop-2.7.1"); SparkConf sparkConf = new SparkConf().setMaster("local").setAppName("Spark_DEMO"); JavaSparkContext sc = new JavaSparkContext(sparkConf); JavaPairRDD javaPairRDD1 = sc.parallelizePairs(Lists.newArrayList( new Tuple2("cat", "11"), new Tuple2("dog", "22"), new Tuple2("cat", "11"), new Tuple2("pig", "44"), new Tuple2("duck", "55"), new Tuple2("cat", "66")), 3); Map, Long> value = javaPairRDD1.countByValue(); for (Map.Entry, Long> entry : value.entrySet()){ System.out.println(entry.getKey()+"->"+entry.getValue()); } } }

19/03/20 17:15:31 INFO DAGScheduler: Job 0 finished: countByValue at CountByValue.java:23, took 1.093040 s 19/03/20 17:15:31 INFO SparkContext: Invoking stop() from shutdown hook (duck,55)->1 (dog,22)->1 (pig,44)->1 (cat,66)->1 (cat,11)->2 19/03/20 17:15:31 INFO SparkUI: Stopped Spark web UI at http://10.124.209.6:4040

/** * Approximate version of countByValue(). * * The confidence is the probability that the error bounds of the result will * contain the true value. That is, if countApprox were called repeatedly * with confidence 0.9, we would expect 90% of the results to contain the * true count. The confidence must be in the range [0,1] or an exception will * be thrown. * * @param timeout maximum time to wait for the job, in milliseconds * @param confidence the desired statistical confidence in the result * @return a potentially incomplete result, with error bounds */

CountByValue()的近似版本。

置信度必须在[0,1]范围内,否则异常将被扔掉。

*@参数超时等待作业的最长时间(毫秒)

*@参数置信度结果中所需的统计置信度

*@返回一个可能不完整的结果,带有错误界限

// java public static PartialResult> countByValueApprox(long timeout) public static PartialResult> countByValueApprox(long timeout, double confidence) // scala def countByValueApprox(timeout: Long): PartialResult[Map[(K, V), BoundedDouble]] def countByValueApprox(timeout: Long, confidence: Double): PartialResult[Map[(K, V), BoundedDouble]]

EI企业智能 Java spark 可信智能计算服务 TICS 智能数据

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:OpenResty学习地图来啦,速速收藏!
下一篇:markdown快速入门
相关文章