SlideShare a Scribd company logo
Composing	
  and	
  Scaling	
  Data	
  Platforms	
  
Rahul	
  Kumar	
  
Data	
  Representation
Architecture
Parallelism
Talk	
  Highlights
 As	
  software	
  engineer	
  we	
  are	
  inevitably	
  affected	
  by	
  the	
  tools	
  we	
  surrounded	
  ourself	
  with	
  
Process
all	
  act	
  to	
  shape	
  the	
  software	
  we	
  build.
Language
Frameworks
Likewise	
  database,	
  which	
  have	
  trodden	
  a	
  very	
  specific	
  path,	
  inevitably	
  affect	
  the	
  way	
  
we	
  treat	
  mutability	
  and	
  share	
  state	
  in	
  our	
  application.	
  
5
Today’s data platforms range greatly in complexity.
From simple caching layers or Polyglot Persistence right through to
wholly
integrated data pipelines.
There are many paths.
They go to many different places.
So the aim for this talk is to explain how and why some of these popular approaches work.
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
This	
  talk	
  is	
  based	
  on	
  Ben	
  Stopford’s	
  actual	
  presentation.	
  
6
Computer	
  work	
  best	
  with	
  sequential	
  workload
When we’re dealing with data, we’re really just arranging locality.
Locality to the CPU.
Locality to the other data we need.
7
Accessing	
  data	
  sequentially	
  is	
  an	
  important	
  
component	
  of	
  this.	
  	
  
Computers	
  are	
  just	
  good	
  at	
  sequential	
  operations.	
  	
  
Sequential	
  operations	
  can	
  be	
  predicted.	
  	
  
8
Random	
  vs	
  Sequential	
  Addressing
If	
  you’r	
  taking	
  data	
  from	
  disk	
  sequentially	
  it	
  will	
  	
  be	
  pre-­‐fetched	
  in	
  to	
  	
  
the	
  disk	
  buffer,	
  	
  
the	
  page	
  cache	
  and	
  	
  
the	
  different	
  levels	
  of	
  CPU	
  caching.
But it does little to help the addressing of data at random, be it in main memory,
on disk or over the network.
In fact pre-fetching actually hinders random workloads as the various
caches and frontside bus fill with data which is unlikely to be used.
9
Streaming	
  data	
  sequentially	
  from	
  disk	
  can	
  actually	
  
outperform	
  randomly	
  addressed	
  main	
  memory.	
  	
  
So	
  disk	
  may	
  not	
  always	
  be	
  quite	
  the	
  tortoise	
  we	
  
think	
  it	
  is,	
  	
  
at	
  least	
  not	
  if	
  we	
  can	
  arrange	
  sequential	
  access.	
  	
  
10
We	
  want	
  to	
  keep	
  writes	
  and	
  reads	
  sequential,	
  as	
  it	
  works	
  well	
  with	
  the	
  
hardware.	
  	
  
We	
  can	
  append	
  writes	
  to	
  the	
  end	
  of	
  the	
  file	
  efficiently.	
  	
  
We	
  can	
  read	
  by	
  scanning	
  the	
  the	
  file	
  in	
  its	
  entirety.	
  	
  
Any	
  processing	
  we	
  wish	
  to	
  do	
  can	
  happen	
  as	
  the	
  data	
  streams	
  through	
  the	
  
CPU.	
  	
  
We	
  might	
  filter,	
  aggregate	
  or	
  even	
  do	
  something	
  more	
  complex.	
  	
  
11
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
12
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
13
14
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
15
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
16
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
17
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
18
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
19
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
20
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
21
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
22
Parallelism
23
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
24
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
25
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
26
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
27
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
28
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
29
Architecture
30
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
31
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
32
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
33
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
34
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
35
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
36
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
37
http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/
Thank You

More Related Content

Similar to Composing and Scaling Data Platforms-2015

What Every Programmer Should Know About Memory
What Every Programmer Should Know About MemoryWhat Every Programmer Should Know About Memory
What Every Programmer Should Know About MemoryYing wei (Joe) Chou
 
Big Data Glossary of terms
Big Data Glossary of termsBig Data Glossary of terms
Big Data Glossary of terms
Kognitio
 
White Paper: Still All on One Server: Perforce at Scale
White Paper: Still All on One Server: Perforce at ScaleWhite Paper: Still All on One Server: Perforce at Scale
White Paper: Still All on One Server: Perforce at Scale
Perforce
 
Webcast Q&A- Big Data Architectures Beyond Hadoop
Webcast Q&A- Big Data Architectures Beyond HadoopWebcast Q&A- Big Data Architectures Beyond Hadoop
Webcast Q&A- Big Data Architectures Beyond Hadoop
Impetus Technologies
 
Insiders Guide- Managing Storage Performance
Insiders Guide- Managing Storage PerformanceInsiders Guide- Managing Storage Performance
Insiders Guide- Managing Storage PerformanceDataCore Software
 
scale_perf_best_practices
scale_perf_best_practicesscale_perf_best_practices
scale_perf_best_practiceswebuploader
 
Sybase IQ ile Analitik Platform
Sybase IQ ile Analitik PlatformSybase IQ ile Analitik Platform
Sybase IQ ile Analitik Platform
Sybase Türkiye
 
Building an analytical platform
Building an analytical platformBuilding an analytical platform
Building an analytical platform
David Walker
 
Lecture 24
Lecture 24Lecture 24
Lecture 24
Shani729
 
[NetherRealm Studios] Game Studio Perforce Architecture
[NetherRealm Studios] Game Studio Perforce Architecture[NetherRealm Studios] Game Studio Perforce Architecture
[NetherRealm Studios] Game Studio Perforce ArchitecturePerforce
 
The Fundamental Characteristics of Storage concepts for DBAs
The Fundamental Characteristics of Storage concepts for DBAsThe Fundamental Characteristics of Storage concepts for DBAs
The Fundamental Characteristics of Storage concepts for DBAs
Alireza Kamrani
 
Apache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 MistakesApache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 Mistakes
John Coggeshall
 
What every-programmer-should-know-about-memory
What every-programmer-should-know-about-memoryWhat every-programmer-should-know-about-memory
What every-programmer-should-know-about-memoryxan peng
 
The Server Side of Responsive Web Design
The Server Side of Responsive Web DesignThe Server Side of Responsive Web Design
The Server Side of Responsive Web Design
Dave Olsen
 
Database Configuration for Maximum SharePoint 2010 Performance
Database Configuration for Maximum SharePoint 2010 PerformanceDatabase Configuration for Maximum SharePoint 2010 Performance
Database Configuration for Maximum SharePoint 2010 Performance
Edwin M Sarmiento
 
Sequential file programming patterns and performance with .net
Sequential  file programming patterns and performance with .netSequential  file programming patterns and performance with .net
Sequential file programming patterns and performance with .net
Michael Pavlovsky
 
LOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONSLOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONS
ijdpsjournal
 
Lock free parallel access collections
Lock free parallel access collectionsLock free parallel access collections
Lock free parallel access collections
ijdpsjournal
 
Identify_Stability_Problems
Identify_Stability_ProblemsIdentify_Stability_Problems
Identify_Stability_ProblemsMichael Materie
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developersBiju Nair
 

Similar to Composing and Scaling Data Platforms-2015 (20)

What Every Programmer Should Know About Memory
What Every Programmer Should Know About MemoryWhat Every Programmer Should Know About Memory
What Every Programmer Should Know About Memory
 
Big Data Glossary of terms
Big Data Glossary of termsBig Data Glossary of terms
Big Data Glossary of terms
 
White Paper: Still All on One Server: Perforce at Scale
White Paper: Still All on One Server: Perforce at ScaleWhite Paper: Still All on One Server: Perforce at Scale
White Paper: Still All on One Server: Perforce at Scale
 
Webcast Q&A- Big Data Architectures Beyond Hadoop
Webcast Q&A- Big Data Architectures Beyond HadoopWebcast Q&A- Big Data Architectures Beyond Hadoop
Webcast Q&A- Big Data Architectures Beyond Hadoop
 
Insiders Guide- Managing Storage Performance
Insiders Guide- Managing Storage PerformanceInsiders Guide- Managing Storage Performance
Insiders Guide- Managing Storage Performance
 
scale_perf_best_practices
scale_perf_best_practicesscale_perf_best_practices
scale_perf_best_practices
 
Sybase IQ ile Analitik Platform
Sybase IQ ile Analitik PlatformSybase IQ ile Analitik Platform
Sybase IQ ile Analitik Platform
 
Building an analytical platform
Building an analytical platformBuilding an analytical platform
Building an analytical platform
 
Lecture 24
Lecture 24Lecture 24
Lecture 24
 
[NetherRealm Studios] Game Studio Perforce Architecture
[NetherRealm Studios] Game Studio Perforce Architecture[NetherRealm Studios] Game Studio Perforce Architecture
[NetherRealm Studios] Game Studio Perforce Architecture
 
The Fundamental Characteristics of Storage concepts for DBAs
The Fundamental Characteristics of Storage concepts for DBAsThe Fundamental Characteristics of Storage concepts for DBAs
The Fundamental Characteristics of Storage concepts for DBAs
 
Apache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 MistakesApache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 Mistakes
 
What every-programmer-should-know-about-memory
What every-programmer-should-know-about-memoryWhat every-programmer-should-know-about-memory
What every-programmer-should-know-about-memory
 
The Server Side of Responsive Web Design
The Server Side of Responsive Web DesignThe Server Side of Responsive Web Design
The Server Side of Responsive Web Design
 
Database Configuration for Maximum SharePoint 2010 Performance
Database Configuration for Maximum SharePoint 2010 PerformanceDatabase Configuration for Maximum SharePoint 2010 Performance
Database Configuration for Maximum SharePoint 2010 Performance
 
Sequential file programming patterns and performance with .net
Sequential  file programming patterns and performance with .netSequential  file programming patterns and performance with .net
Sequential file programming patterns and performance with .net
 
LOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONSLOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONS
 
Lock free parallel access collections
Lock free parallel access collectionsLock free parallel access collections
Lock free parallel access collections
 
Identify_Stability_Problems
Identify_Stability_ProblemsIdentify_Stability_Problems
Identify_Stability_Problems
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developers
 

Composing and Scaling Data Platforms-2015