Tests
Suites
Latest Results
Search
Register
Login
Popular Tests
Flexible IO Tester
Timed Linux Kernel Compilation
Blender
7-Zip Compression
SVT-AV1
FFmpeg
Newest Tests
OpenVINO GenAI
Rustls
LiteRT
WarpX
Epoch
Valkey
Recently Updated Tests
srsRAN Project
VVenC
x265
RELION
Llamafile
Llama.cpp
New & Recently Updated Tests
Recently Updated Suites
Machine Learning
Server Motherboard
HPC - High Performance Computing
New & Recently Updated Suites
Component Benchmarks
CPUs / Processors
GPUs / Graphics
OpenGL
Disks / Storage
Motherboards
File-Systems
Operating Systems
OpenBenchmarking.org
Corporate / Organization Info
Bug Reports / Feature Requests
Apache Spark 1.0.0
pts/spark-1.0.0
- 04 August 2022 -
Initial commit of Apache Spark benchmark.
downloads.xml
<?xml version="1.0"?> <!--Phoronix Test Suite v10.8.4--> <PhoronixTestSuite> <Downloads> <Package> <URL>https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz</URL> <MD5>735173ca7be2e5e55715499fbc719022</MD5> <SHA256>306b550f42ce1b06772d6084c545ef8448414f2bf451e0b1175405488f2a322f</SHA256> <FileName>spark-3.3.0-bin-hadoop3.tgz</FileName> <FileSize>299321244</FileSize> </Package> <Package> <URL>https://github.com/DIYBigData/pyspark-benchmark/archive/3bbf3e521763517bc9aa73504dd66a5bdeb5b6af.zip</URL> <MD5>82e9dac0213fe5a331d47358e8e754f3</MD5> <SHA256>149e7d5dd2c86111cf94a9299ef1da1bd1c4a136486b1115a4314dda7304ec55</SHA256> <FileName>pyspark-benchmark-3bbf3e521763517bc9aa73504dd66a5bdeb5b6af.zip</FileName> <FileSize>299321244</FileSize> </Package> </Downloads> </PhoronixTestSuite>
install.sh
#!/bin/sh tar -xf spark-3.3.0-bin-hadoop3.tgz unzip -o pyspark-benchmark-3bbf3e521763517bc9aa73504dd66a5bdeb5b6af.zip rm -rf pyspark-benchmark mv pyspark-benchmark-3bbf3e521763517bc9aa73504dd66a5bdeb5b6af pyspark-benchmark # Avoid out of memory errors echo "spark.driver.memory 6g" > ~/spark-3.3.0-bin-hadoop3/conf/spark-defaults.conf echo "#!/bin/bash cd ~/spark-3.3.0-bin-hadoop3/bin ./spark-submit --name 'benchmark-shuffle' \$HOME/pyspark-benchmark/benchmark-shuffle.py \$HOME/test-data > \$LOG_FILE 2>&1 ./spark-submit --name 'benchmark-cpu' \$HOME/pyspark-benchmark/benchmark-cpu.py \$HOME/test-data >> \$LOG_FILE 2>&1 " > spark chmod +x spark
post.sh
#!/bin/sh rm -rf $HOME/test-data
pre.sh
#!/bin/sh cd ~/spark-3.3.0-bin-hadoop3/bin rm -rf $HOME/test-data ./spark-submit --name 'generate-benchmark-test-data' $HOME/pyspark-benchmark/generate-data.py $HOME/test-data $@
results-definition.xml
<?xml version="1.0"?> <!--Phoronix Test Suite v10.8.4--> <PhoronixTestSuite> <ResultsParser> <OutputTemplate>22/08/03 15:39:58 INFO __main__: SHA-512 benchmark time = #_RESULT_# seconds for 1,000,000 hashes</OutputTemplate> <LineHint>SHA-512 benchmark time</LineHint> <ResultBeforeString>seconds</ResultBeforeString> <AppendToArgumentsDescription>SHA-512 Benchmark Time</AppendToArgumentsDescription> </ResultsParser> <ResultsParser> <OutputTemplate>22/08/03 15:39:58 INFO __main__: Calculate Pi benchmark = #_RESULT_# seconds with pi = 3.1416092936, samples = 5,000,000,000</OutputTemplate> <LineHint>Calculate Pi benchmark =</LineHint> <ResultBeforeString>seconds</ResultBeforeString> <AppendToArgumentsDescription>Calculate Pi Benchmark</AppendToArgumentsDescription> </ResultsParser> <ResultsParser> <OutputTemplate>22/08/03 15:39:58 INFO __main__: Calculate Pi benchmark using dataframe = #_RESULT_# seconds with pi = 3.1416040392, samples = 5,000,000,000</OutputTemplate> <LineHint>Calculate Pi benchmark using dataframe</LineHint> <ResultBeforeString>seconds</ResultBeforeString> <AppendToArgumentsDescription>Calculate Pi Benchmark Using Dataframe</AppendToArgumentsDescription> </ResultsParser> <ResultsParser> <OutputTemplate>22/08/03 18:06:24 INFO __main__: Group By test time = #_RESULT_# seconds</OutputTemplate> <LineHint>Group By test time</LineHint> <ResultBeforeString>seconds</ResultBeforeString> <AppendToArgumentsDescription>Group By Test Time</AppendToArgumentsDescription> </ResultsParser> <ResultsParser> <OutputTemplate>22/08/03 18:06:24 INFO __main__: Repartition test time = #_RESULT_# seconds (200 partitions)</OutputTemplate> <LineHint>Repartition test time</LineHint> <ResultBeforeString>seconds</ResultBeforeString> <AppendToArgumentsDescription>Repartition Test Time</AppendToArgumentsDescription> </ResultsParser> <ResultsParser> <OutputTemplate>22/08/03 18:06:24 INFO __main__: Inner join test time = #_RESULT_# seconds</OutputTemplate> <LineHint>: Inner join test time</LineHint> <ResultBeforeString>seconds</ResultBeforeString> <AppendToArgumentsDescription>Inner Join Test Time</AppendToArgumentsDescription> </ResultsParser> <ResultsParser> <OutputTemplate>22/08/03 18:06:24 INFO __main__: Broadcast inner join time = #_RESULT_# seconds </OutputTemplate> <LineHint>Broadcast inner join time</LineHint> <ResultBeforeString>seconds</ResultBeforeString> <AppendToArgumentsDescription>Broadcast Inner Join Test Time</AppendToArgumentsDescription> </ResultsParser> </PhoronixTestSuite>
test-definition.xml
<?xml version="1.0"?> <!--Phoronix Test Suite v10.8.4--> <PhoronixTestSuite> <TestInformation> <Title>Apache Spark</Title> <AppVersion>3.3</AppVersion> <Description>This is a benchmark of Apache Spark with its PySpark interface. Apache Spark is an open-source unified analytics engine for large-scale data processing and dealing with big data. This test profile benchmars the Apache Spark in a single-system configuration using spark-submit. The test makes use of DIYBigData's pyspark-benchmark (https://github.com/DIYBigData/pyspark-benchmark/) for generating of test data and various Apache Spark operations.</Description> <ResultScale>Seconds</ResultScale> <Proportion>LIB</Proportion> <TimesToRun>3</TimesToRun> </TestInformation> <TestProfile> <Version>1.0.0</Version> <SupportedPlatforms>Linux</SupportedPlatforms> <SoftwareType>Application</SoftwareType> <TestType>System</TestType> <License>Free</License> <Status>Verified</Status> <ExternalDependencies>java, python</ExternalDependencies> <EnvironmentSize>2100</EnvironmentSize> <ProjectURL>https://spark.apache.org/</ProjectURL> <RepositoryURL>https://github.com/apache/spark</RepositoryURL> <Maintainer>Michael Larabel</Maintainer> </TestProfile> <TestSettings> <Option> <DisplayName>Row Count</DisplayName> <Identifier>rows</Identifier> <ArgumentPrefix>-r </ArgumentPrefix> <Menu> <Entry> <Name>1000000</Name> <Value>1000000</Value> </Entry> <Entry> <Name>10000000</Name> <Value>10000000</Value> </Entry> <Entry> <Name>20000000</Name> <Value>20000000</Value> </Entry> <Entry> <Name>40000000</Name> <Value>40000000</Value> </Entry> </Menu> </Option> <Option> <DisplayName>Partitions</DisplayName> <Identifier>partitions</Identifier> <ArgumentPrefix>-p </ArgumentPrefix> <Menu> <Entry> <Name>100</Name> <Value>100</Value> </Entry> <Entry> <Name>500</Name> <Value>500</Value> </Entry> <Entry> <Name>1000</Name> <Value>1000</Value> </Entry> <Entry> <Name>2000</Name> <Value>2000</Value> </Entry> </Menu> </Option> </TestSettings> </PhoronixTestSuite>