© 2017 – All Rights Reserved
© 2017 – All Rights Reserved
Acceleration of Generic SPARK
Workloads via a “Sea of Cores”
Scalable Compute Fabric
Paul Master
CTO
pmaster@cornami.com
© 2017 – All Rights Reserved
Using “Linear Increases in Performance” to
“Process Exponentially more Data”
Year to Year Trends
© 2017 – All Rights Reserved
Growth of Data
Exponential
© 2017 – All Rights Reserved
Transistors Per Chip
Exponential
© 2017 – All Rights Reserved
35 Years of Microprocessor Trend Data
© 2017 – All Rights Reserved
CPU Performance
Flat
© 2017 – All Rights Reserved
The Workloads Have Changed
• The Run Time characteristics of these two types of workloads is
NOT the same.
• So let’s look at a non-traditional processor architecture that is
tuned for Big Data / ML workloads.
etc…
© 2017 – All Rights Reserved
So, the Architecture Has to Change
Intel Haswell CPU
200m transistors per core
etc…
“Sea of Cores”
150-200k transistors per core
© 2017 – All Rights Reserved
Hardware:
• At 22nm one can fit 52 ARM A7’s in the space of 1 Intel
Haswell core
• What interconnect(s), what type/size of caches, what
coherency, what I/O….
Software:
• We need is a way of parallelizing workloads to run
across many, many small cores instead of a few large
cores
• Its got to be software acceleration (<CR> and go)
• We need a large “software” base of real world
applications that matter
Oh Wait! We have
© 2017 – All Rights Reserved
Usage Model for Dense Computational Fabric
© 2017 – All Rights Reserved
Performance Comparison between a standard dual
socket 1U Server (16 core) vs. a 1U Server with a
TruStream Dense Computational Fabric (1000 cores)
Live Demonstration:
Yahoo Streaming Benchmark
© 2017 – All Rights Reserved
What is it?
Yahoo Streaming Benchmark
Measures Real-Time Mobile
Advertising performance
Executive Summary…
“Due to a lack of real-world
streaming benchmarks, we
developed one to compare
Apache Flink, Apache Storm
and Apache Spark
Streaming.”
© 2017 – All Rights Reserved
Mobile ad revenue was about 84% of total ad
revenue, an increase from about 76% of total
ad revenue in the year-ago quarter.
That means Facebook Inc. (NASDAQ:FB) had
about $5.42 billion on mobile ad revenue,
which was well ahead of the StreetAccount
average of $4.84 billion.”
http://www.valuewalk.com/2016/07/facebook-inc-fb-earnings-beat/
© 2017 – All Rights Reserved
• Our TruStream™ Compute Fabric is a dense
computational fabric consisting of a Non-Von Neumann
“Sea of Cores” produced a stepwise increase in the
Yahoo Streaming Benchmark Performance.
• Greater than 40x speedup - This is the fastest Yahoo
Streaming Benchmark Result reported to date.
• This was done by transparently accelerating the generic
SCALA code within the Spark Framework.
Stepwise Performance Increase
© 2017 – All Rights Reserved
Growth of Data
2017
Exponential
© 2017 – All Rights Reserved
Transistors Per Chip
Exponential
2017
© 2017 – All Rights Reserved
CORNAMI’s “Sea of Cores” Solution
FlatExponential
2017
© 2017 – All Rights Reserved
Thank You!
Paul Master
pmaster@cornami.com
© 2017 – All Rights Reserved
RESULTS:

Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master

  • 1.
    © 2017 –All Rights Reserved
  • 2.
    © 2017 –All Rights Reserved Acceleration of Generic SPARK Workloads via a “Sea of Cores” Scalable Compute Fabric Paul Master CTO pmaster@cornami.com
  • 3.
    © 2017 –All Rights Reserved Using “Linear Increases in Performance” to “Process Exponentially more Data” Year to Year Trends
  • 4.
    © 2017 –All Rights Reserved Growth of Data Exponential
  • 5.
    © 2017 –All Rights Reserved Transistors Per Chip Exponential
  • 6.
    © 2017 –All Rights Reserved 35 Years of Microprocessor Trend Data
  • 7.
    © 2017 –All Rights Reserved CPU Performance Flat
  • 8.
    © 2017 –All Rights Reserved The Workloads Have Changed • The Run Time characteristics of these two types of workloads is NOT the same. • So let’s look at a non-traditional processor architecture that is tuned for Big Data / ML workloads. etc…
  • 9.
    © 2017 –All Rights Reserved So, the Architecture Has to Change Intel Haswell CPU 200m transistors per core etc… “Sea of Cores” 150-200k transistors per core
  • 10.
    © 2017 –All Rights Reserved Hardware: • At 22nm one can fit 52 ARM A7’s in the space of 1 Intel Haswell core • What interconnect(s), what type/size of caches, what coherency, what I/O…. Software: • We need is a way of parallelizing workloads to run across many, many small cores instead of a few large cores • Its got to be software acceleration (<CR> and go) • We need a large “software” base of real world applications that matter Oh Wait! We have
  • 11.
    © 2017 –All Rights Reserved Usage Model for Dense Computational Fabric
  • 12.
    © 2017 –All Rights Reserved Performance Comparison between a standard dual socket 1U Server (16 core) vs. a 1U Server with a TruStream Dense Computational Fabric (1000 cores) Live Demonstration: Yahoo Streaming Benchmark
  • 13.
    © 2017 –All Rights Reserved What is it? Yahoo Streaming Benchmark Measures Real-Time Mobile Advertising performance Executive Summary… “Due to a lack of real-world streaming benchmarks, we developed one to compare Apache Flink, Apache Storm and Apache Spark Streaming.”
  • 14.
    © 2017 –All Rights Reserved Mobile ad revenue was about 84% of total ad revenue, an increase from about 76% of total ad revenue in the year-ago quarter. That means Facebook Inc. (NASDAQ:FB) had about $5.42 billion on mobile ad revenue, which was well ahead of the StreetAccount average of $4.84 billion.” http://www.valuewalk.com/2016/07/facebook-inc-fb-earnings-beat/
  • 15.
    © 2017 –All Rights Reserved • Our TruStream™ Compute Fabric is a dense computational fabric consisting of a Non-Von Neumann “Sea of Cores” produced a stepwise increase in the Yahoo Streaming Benchmark Performance. • Greater than 40x speedup - This is the fastest Yahoo Streaming Benchmark Result reported to date. • This was done by transparently accelerating the generic SCALA code within the Spark Framework. Stepwise Performance Increase
  • 16.
    © 2017 –All Rights Reserved Growth of Data 2017 Exponential
  • 17.
    © 2017 –All Rights Reserved Transistors Per Chip Exponential 2017
  • 18.
    © 2017 –All Rights Reserved CORNAMI’s “Sea of Cores” Solution FlatExponential 2017
  • 19.
    © 2017 –All Rights Reserved Thank You! Paul Master pmaster@cornami.com
  • 20.
    © 2017 –All Rights Reserved RESULTS: