DealingwithVerificationDataOverload

SHELLEY LAMBERT
LAN XIA
IBM RUNTIME TECHNOLOGIES
NOV 2019
@SHELLEYMLAMBERT
@BONJOURLAN
DEALING WITH
VERIFICATION DATA OVERLOAD
Dealing with Verification Data Overload

• THE SCOPE
• TEST FRAMEWORK (TESTKITGEN)
• JENKINS BUILDS
• TEST RESULT SUMMARY SERVICE (TRSS)
• DATA REFINERY EXPERIMENTS
• PLANS FORWARD
AGENDA

THE SCOPE
AdoptOpenJDK
• ENSURING FREE AND VERIFIED JAVA™ FOR THE COMMUNITY
• PROJECTS: ECLIPSE OMR, ECLIPSE OPENJ9, ADOPTOPENJDK
• 6+ JENKINS SERVERS

DEGREES OF FREEDOM
OpenJ9 Hotspot SAP Corretto
8 9 10 11 12 +
openjdk functional perf system external
RI
13
JDK Implementations
Platforms
JDK Versions
Test Categories
osx
osx aix win xlinux plinux
6 versions
aarch
58 impl_platform
250000 tests
87,000,000 Tests Impl_platform x testLevels x testGroups x versions
58 x 2 x 3 x 6 x 10M = 20G+ test output per nightly build
Plus PR builds,
promotion builds and
personal builds

GATHER GREAT TESTS
6
functional openjdk perfexternalsystem

GATHER GREAT TESTS
7
testNG,
cmdlinetester
STF junit & others
Assorted
benchmarks
Jtreg, testNG

ADOPTOPENJDK QUALITY ASSURANCE
(AQA)
8Dealing with Verification Data Overload
• “Make quality certain to happen”
• Testing a wide criteria representing actual business requirements to identify
binaries ready for production usage
Today Roadmap
Functional correctness Security
OpenJDK regression (open) Passes known vulnerability tests
Oracle JCK (closed) Functional correctness
OpenJDK regression
Builder-specific testing (unknown) Eclipse functional
Application & framework tests
Performance
Published metrics
Achieves minimum throughput scores
Scalability & durability
Load & stress testing

AQA MANIFESTO
• open & transparent
• diverse & robust set of test suites
• evolution alongside implementations
• continual investment
• process to modify
• codecov & other metrics
• comparative analysis
• portable
• tag & publish

INNOVATE AND COLLABORATE
• Reactive systems
• Latitude
• Flexible
• Common
• Standardized
• Simple

GRANULARITY
• Specific Testcases/groups
• Different levels
• Different versions
• Different implementations
• Different iterations
• Different features
• With/Without native test image
• …
11
sanity/extended/special
8/11/13/14…
openj9/hotspot/ibm/corretto/sap
1/2/n…
AOT/JITAAS
functional/system/openjdk/perf/external

CONSOLIDATE AND CURATE
12
TestKitGen
testNG,
cmdlinetester
STF junit & others
Assorted
benchmarks
Jtreg, testNG

TEST FRAMEWORK (TESTKITGEN)

GROUPING & GRANULARITY
• group=openjdk
• levels=sanity.openjdk, extended.openjdk, special.openjdk
• targets=tests in playlist file
• jdk_awt, jdk_math, jdk_lang, etc.
• jdk_custom=CUSTOM_TARGET env var
• set to individual directories or classes
14
openjdk
sanity.openjdk
jdk_math
Java/math/BigDecimal/NegateTests.java

ADOPTOPENJDK CI PIPELINE
Run in parallel
Build Deploy
openjdk
functional
system
perf
external
Test
sanity.system
lambdaLoadTest
extended.system
special.system
mathLoadTest
…
daaLoadTest

JENKINS SCRIPTS FOR TESTING
16
• Repo:
https://github.com/AdoptOpenJDK/openjdk-tests
• One script (JenkinsfileBase) for all
test builds:
• Nightly/release
• Pull Request
• Promotion
• Personal/Grinder

SURVEY OF TESTS: CI.ADOPTOPENJDK.NET
17
Categorize test builds
based on JDK Version,
JDK Impl, test category
and platform

TAP & JUNIT PLUGIN
18
Standardize output

ARCHIVE DATA
• Archive test data from failed tests onto Jenkins master or Artifactory:
• Test logs/output files
• Diagnostic files (core/trace/javacore files)
• TAP file, JUnit *.jtr, *.xml files
• Test repo SHA
19
Minimize stored
artifacts

ADOPTOPENJDK TESTING
vm
docker
physical
machine
Jenkins
Server
artifactory How to monitor
the result???

TEST RESULT SUMMARY SERVICE (TRSS)
• Monitors multiple Jenkins servers
• Personalized Dashboard
• Provide filtering, sorting, comparing and searching feature
• Provide history for triaging and performance trends
• Available at https://trss.adoptopenjdk.net/
• Git repo: https://github.com/AdoptOpenJDK/openjdk-test-tools

TRSS OVERVIEW
22
frontend
backend
backend
backend
TRSS Server
Client
Multiple
Jenkins
Servers
MongoDB

TRSS: PERSONALIZED DASHBOARD
23
Personalize views, only
what you need

TRSS: PERSONALIZED DASHBOARD
Grid view for test
builds summary

TRSS: MONITOR JENKINS PIPELINE BUILDS
25
Aggregate results from
top-level pipelines

TRSS: TEST BUILDS RESULT
26
Test Result Summary

TRSS: TESTS RESULT
27
Filter and sort on test name, duration, and
test results

TRSS: TEST RESULT ACROSS ALL BUILDS
28
Compare test results for all JDK
impls, JDK versions and platforms

TRSS: SEARCH TEST
Search test output among
different test builds

TRSS: TEST COMPARE
30
Diff test results among two
different builds

TRSS: PERF DASHBOARD
31
Visualize data
graphically

LET US COUNT THE WAYS
• categorize
• standardize
• minimize
• personalize
• aggregate
• summarize
• filter
• sort
• compare
• search
• diff
• visualize
• model

WHAT IS DEEP LEARNING?
Deep learning is a subset of ML algorithms distinguished by:
• Loosely based on structure and function of the brain, use
artificial neural networks (ANN)
• Multiple layers of processing units, “neurons”, output of a
layer is input to another layer
• Modes of learning, supervised (regression, classification)
or unsupervised (pattern analysis)

INITIAL DL EXPERIMENTS
• Preprocess testOutput based on own listed feature key words index.
34
Padding preprocessed testOutput for Deep Learning
model:
“[4, 7, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…]”
Label: “1”
Original testOutput:
“Running test TestIBMJlmLocal_0: ERROR code 3, FAILED test 1,
running second test, exception ValueError, exit code 1”
Label: “FAILED”
Preprocessed testOutput:
“[4,
Label:
7, 10]”
“1”

INITIAL DL EXPERIMENTS
- Training Accuracy and Validation
Accuracy, our evaluation result is
98.897% with 7800 training data
(3900 failed and 3900 passed) and
7800 test data
- Training Loss and Validation Loss

MODEL BUILDING
36
JVM version
Variants used
Things we know
(input layer)
Failure expression
Platform
JVM Impl
Machine ‘age’
Failure age
PR list
Defect category
Things we want to know
(output layer)
Bug prediction scores
Best next action
Rate value of test
SHAs

Deep Learning for Fuzzing Java Compilers

Deep Learning for Fuzzing Java Compilers
• Integrated DeepSmith with TKG and TRSS
• Easily run thousands of DeepSmith tests in Jenkins with different
JDK versions / impls / JVM options
• Compare and monitor test outputs using TRSS

PLANS FORWARD
• Test smarter (as test volume increases)
• Change-based testing
• Bug prediction service
• Enhance TRSS with analytics services
• Build skills and continue model/deploy, observe & measure
• Innovate/Collaborate
• AI-driven fuzz testing (with Professor Hugh Leather)
• Test generation service (application of CTD)
• Leverage & deploy useful models in open projects

@ShelleyMLambert
@bonjourlan
AdoptOpenJDK: https://adoptopenjdk.net/
Git repo:
https://github.com/AdoptOpenJDK/openjdk-tests
https://github.com/AdoptOpenJDK/openjdk-test-tools
https://github.com/AdoptOpenJDK/TKG
https://github.com/eclipse/openj9
THANK YOU!

DealingwithVerificationDataOverload

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Similar to DealingwithVerificationDataOverload

Similar to DealingwithVerificationDataOverload (20)

Recently uploaded

Recently uploaded (20)

DealingwithVerificationDataOverload

Editor's Notes