Composing and Scaling Data Platforms-2015

•

0 likes•215 views

Rahul Kumar

Data
Representation
Architecture
Parallelism
Talk
Highlights

As
software
engineer
we
are
inevitably
affected
by
the
tools
we
surrounded
ourself
with

Process
all
act
to
shape
the
software
we
build.
Language
Frameworks

Likewise
database,
which
have
trodden
a
very
specific
path,
inevitably
affect
the
way

we
treat
mutability
and
share
state
in
our
application.

5
Today’s data platforms range greatly in complexity.
From simple caching layers or Polyglot Persistence right through to
wholly
integrated data pipelines.
There are many paths.
They go to many different places.
So the aim for this talk is to explain how and why some of these popular approaches work.
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/
This
talk
is
based
on
Ben
Stopford’s
actual
presentation.

6
Computer
work
best
with
sequential
workload
When we’re dealing with data, we’re really just arranging locality.
Locality to the CPU.
Locality to the other data we need.

7
Accessing
data
sequentially
is
an
important

component
of
this.

Computers
are
just
good
at
sequential
operations.

Sequential
operations
can
be
predicted.

8
Random
vs
Sequential
Addressing
If
you’r
taking
data
from
disk
sequentially
it
will

be
pre-‐fetched
in
to

the
disk
buffer,

the
page
cache
and

the
different
levels
of
CPU
caching.
But it does little to help the addressing of data at random, be it in main memory,
on disk or over the network.
In fact pre-fetching actually hinders random workloads as the various
caches and frontside bus ﬁll with data which is unlikely to be used.

9
Streaming
data
sequentially
from
disk
can
actually

outperform
randomly
addressed
main
memory.

So
disk
may
not
always
be
quite
the
tortoise
we

think
it
is,

at
least
not
if
we
can
arrange
sequential
access.

10
We
want
to
keep
writes
and
reads
sequential,
as
it
works
well
with
the

hardware.

We
can
append
writes
to
the
end
of
the
file
efficiently.

We
can
read
by
scanning
the
the
file
in
its
entirety.

Any
processing
we
wish
to
do
can
happen
as
the
data
streams
through
the

CPU.

We
might
filter,
aggregate
or
even
do
something
more
complex.

11
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

12
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

14
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

15
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

16
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

17
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

18
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

19
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

20
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

21
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

23
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

24
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

25
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

26
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

27
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

28
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

30
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

31
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

32
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

33
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

34
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

35
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

36
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

37
http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

Developing Application with Big Data is really challenging work, scaling, fault tolerance and responsiveness some are the biggest challenge. Realtime bigdata application that have self healing feature is a dream these days. Apache Spark is a fast in-memory data processing system that gives a good backend for realtime application.In this talk I will show how to use reactive platform, Actor model and Apache Spark stack to develop a system that have responsiveness, resiliency, fault tolerance and message driven feature.

Reactive dashboard’s using apache spark

Rahul Kumar

Next Generation Enterprise Architecture

MapR Technologies

Apache Mesos, Apache Hadoop, Apache Spark + Custom Enterprise Applications: This stack combined is greater than the sum of each of the pieces of this stack. Mesos can manage resources across an entire data center, Hadoop provides a distributed data store and scalable data processing, and Spark delivers great in-memory and disk-based performance of data processing as well as streaming capabilities. Couple all of that with custom enterprise applications, and the data center turns into a well-oiled machine. When combined, this software stack delivers unlimited flexibility for the entire data center. Jim Scott, Director of Architecture and Enterprise Strategy | Strata + Hadoop World | Barcelona, Spain, November 2014

Datastage parallell jobs vs datastage server jobs

shanker_uma

Business intelligence requirements are changing and business users are moving more and more from historical reporting into predictive analytics in an attempt to get both a better and deeper understanding of their data. Traditionally, building an analytical platform has required an expensive infrastructure and a considerable amount of time for setup and deployment. Here we look at a quick and simple alternative.

Lecture 24

Shani729

[NetherRealm Studios] Game Studio Perforce ArchitecturePerforce

The Fundamental Characteristics of Storage concepts for DBAs

Alireza Kamrani

Apache Con 2008 Top 10 Mistakes

John Coggeshall

What every-programmer-should-know-about-memoryxan peng

The Server Side of Responsive Web Design

Dave Olsen

Responsive web design has become an important tool for front-end developers as they develop mobile-optimized solutions for clients. Browser-detection has been an important tool for server-side developers for the same task for much longer. Unfortunately, both techniques have certain limitations. Depending on project requirements, team make-up and deployment environment combining these two techniques might lead to intriguing solutions for your organization. We'll discuss when it makes sense to take this extra step and we'll explore techniques for combining server-side technology, like server-side feature-detection, with your responsive web designs to deliver the most flexible solutions possible.

Database Configuration for Maximum SharePoint 2010 Performance

Edwin M Sarmiento

Sequential file programming patterns and performance with .net

Michael Pavlovsky

LOCK-FREE PARALLEL ACCESS COLLECTIONS

ijdpsjournal

All new computers have multicore processors. To exploit this hardware parallelism for improved performance, the predominant approach today is multithreading using shared variables and locks. This approach has potential data races that can create a nondeterministic program. This paper presents a promising new approach to parallel programming that is both lock-free and deterministic. The standard forall primitive for parallel execution of for-loop iterations is extended into a more highly structured primitive called a Parallel Operation (POP). Each parallel process created by a POP may read shared variables (or shared collections) freely. Shared collections modified by a POP must be selected from a special set of predefined Parallel Access Collections (PAC). Each PAC has several Write Modes that govern parallel updates in a deterministic way. This paper presents an overview of a Prototype Library that implements this POP-PAC approach for the C++ language, including performance results for two benchmark parallel programs.

Lock free parallel access collections

ijdpsjournal

All new computers have multicore processors. To exploit this hardware parallelism for improved perf ormance, the predominant approach today is multithreading using shared variables and locks. This approach has potential data races that can create a nondeterministic program. This paper presents a promising new approach to parallel programming that is both lock - free and deterministic. The standard forall primitive for parallel execution of for - loop iterations is extended into a more highly structured primitive called a Parallel Operation (POP). Each parallel process created by a POP may read shared variable s (or shared collections) freely. Shared collections modified by a POP must be selected from a special set of predefined Parallel Access Collections (PAC). Each PAC has several Write Modes that govern parallel updates in a deterministic way. This paper pre sents an overview of a Prototype Library that implements this POP - PAC approach for the C++ language, including performance results for two benchmark parallel programs.

Identify_Stability_ProblemsMichael Materie

Netezza fundamentals for developersBiju Nair

Similar to Composing and Scaling Data Platforms-2015

What Every Programmer Should Know About MemoryYing wei (Joe) Chou

Big Data Glossary of terms

Kognitio

White Paper: Still All on One Server: Perforce at Scale

Perforce

Webcast Q&A- Big Data Architectures Beyond Hadoop

Impetus Technologies

Insiders Guide- Managing Storage PerformanceDataCore Software

scale_perf_best_practiceswebuploader

Sybase IQ ile Analitik Platform

Sybase Türkiye

Building an analytical platform

David Walker

Lecture 24

Shani729

[NetherRealm Studios] Game Studio Perforce ArchitecturePerforce

The Fundamental Characteristics of Storage concepts for DBAs

Alireza Kamrani

Apache Con 2008 Top 10 Mistakes

John Coggeshall

What every-programmer-should-know-about-memoryxan peng

The Server Side of Responsive Web Design

Dave Olsen

Database Configuration for Maximum SharePoint 2010 Performance

Edwin M Sarmiento

Sequential file programming patterns and performance with .net

Michael Pavlovsky

LOCK-FREE PARALLEL ACCESS COLLECTIONS

ijdpsjournal

Lock free parallel access collections

ijdpsjournal

All new computers have multicore processors. To exploit this hardware parallelism for improved perf ormance, the predominant approach today is multithreading using shared variables and locks. This approach has potential data races that can create a nondeterministic program. This paper presents a promising new approach to parallel programming that is both lock - free and deterministic. The standard forall primitive for parallel execution of for - loop iterations is extended into a more highly structured primitive called a Parallel Operation (POP). Each parallel process created by a POP may read shared variable s (or shared collections) freely. Shared collections modified by a POP must be selected from a special set of predefined Parallel Access Collections (PAC). Each PAC has several Write Modes that govern parallel updates in a deterministic way. This paper pre sents an overview of a Prototype Library that implements this POP - PAC approach for the C++ language, including performance results for two benchmark parallel programs.

Identify_Stability_ProblemsMichael Materie

Netezza fundamentals for developersBiju Nair

Similar to Composing and Scaling Data Platforms-2015 (20)

What Every Programmer Should Know About Memory

Big Data Glossary of terms

White Paper: Still All on One Server: Perforce at Scale

Webcast Q&A- Big Data Architectures Beyond Hadoop

Insiders Guide- Managing Storage Performance

scale_perf_best_practices

Sybase IQ ile Analitik Platform

Building an analytical platform

Lecture 24

[NetherRealm Studios] Game Studio Perforce Architecture

The Fundamental Characteristics of Storage concepts for DBAs

Apache Con 2008 Top 10 Mistakes

What every-programmer-should-know-about-memory

The Server Side of Responsive Web Design

Database Configuration for Maximum SharePoint 2010 Performance

Sequential file programming patterns and performance with .net

LOCK-FREE PARALLEL ACCESS COLLECTIONS

Lock free parallel access collections

Identify_Stability_Problems

Netezza fundamentals for developers

Composing and Scaling Data Platforms-2015

1. Composing and Scaling Data Platforms Rahul Kumar

2. Data Representation Architecture Parallelism Talk Highlights

3. As software engineer we are inevitably affected by the tools we surrounded ourself with Process all act to shape the software we build. Language Frameworks

4. Likewise database, which have trodden a very specific path, inevitably affect the way we treat mutability and share state in our application.

5. 5 Today’s data platforms range greatly in complexity. From simple caching layers or Polyglot Persistence right through to wholly integrated data pipelines. There are many paths. They go to many different places. So the aim for this talk is to explain how and why some of these popular approaches work. http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/ This talk is based on Ben Stopford’s actual presentation.

6. 6 Computer work best with sequential workload When we’re dealing with data, we’re really just arranging locality. Locality to the CPU. Locality to the other data we need.

7. 7 Accessing data sequentially is an important component of this. Computers are just good at sequential operations. Sequential operations can be predicted.

8. 8 Random vs Sequential Addressing If you’r taking data from disk sequentially it will be pre-‐fetched in to the disk buffer, the page cache and the different levels of CPU caching. But it does little to help the addressing of data at random, be it in main memory, on disk or over the network. In fact pre-fetching actually hinders random workloads as the various caches and frontside bus ﬁll with data which is unlikely to be used.

9. 9 Streaming data sequentially from disk can actually outperform randomly addressed main memory. So disk may not always be quite the tortoise we think it is, at least not if we can arrange sequential access.

10. 10 We want to keep writes and reads sequential, as it works well with the hardware. We can append writes to the end of the file efficiently. We can read by scanning the the file in its entirety. Any processing we wish to do can happen as the data streams through the CPU. We might filter, aggregate or even do something more complex.

11. 11 http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

12. 12 http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

13. 13

14. 14 http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

15. 15 http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

16. 16 http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

17. 17 http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

18. 18 http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

19. 19 http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

20. 20 http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

21. 21 http://www.benstopford.com/2015/04/28/elements-‐of-‐scale-‐composing-‐and-‐scaling-‐data-‐platforms/

22. 22 Parallelism