Hands-on Introduction
& Hackathon Kickoff
Ashvin Agrawal William Markito
@william_markito@aasoj
Powered by

Pivotal Open Source Hub (POSH)
(incubating)
• Hackathon Details
• Apache Geode Introduction
• History
• Key features and components
• Roadmap
• Hands-on lab
• Build & run
• Starting a cluster
• Using docker for clustering
• Your first app
• Q&A
2
Agenda
Hackathon details
Powered by

Pivotal Open Source Hub (POSH)
http://ambitious-apps.challengepost.com/
4
Introduction
A distributed, memory-based data management platform for
data oriented apps that need:
• high performance, scalability, resiliency and continuous
availability
• fast access to critical data set
• location aware distributed data processing
• event driven data architecture
5
Introduction
6
One size fits all ?
Cost of sorting is nlog(n)
7
One size fits all ?
Cost of sorting is nlog(n)
• Data quality and quantity differences
• Eventual consistency
• Response time expectation
• Scalability challenges: disk, memory, network and
external systems
• 1000+ systems in production (real customers)
• Cutting edge use cases
8
Incubating… but rock solid
2004 2008 2014
•  Massive increase in data
volumes
•  Falling margins per
transaction
•  Increasing cost of IT
maintenance
•  Need for elasticity in
systems
•  Financial Services
Providers (every major
Wall Street bank)
•  Department of Defense
•  Real Time response needs
•  Time to market constraints
•  Need for flexible data
models across enterprise
•  Distributed development
•  Persistence + In-memory
•  Global data visibility needs
•  Fast Ingest needs for data
•  Need to allow devices to
hook into enterprise data
•  Always on
•  Largest travel Portal
•  Airlines
•  Trade clearing
•  Online gambling
•  Largest Telcos
•  Large mfrers
•  Largest Payroll processor
•  Auto insurance giants
•  Largest rail systems on
earth
• 17 billion records in memory
• GE Power & Water's Remote Monitoring & Diagnostics Center
• 3 TB operational data in-memory, 400 TB archived
• China Railways
• 4.6 Million transactions a day / 40K transactions a second
• China Railways
9
Incubating… but rock solid
• Performance optimized persistence
• Configurable consistency
• Elastic capacity
• Latency minimizing distribution
• Heterogenous deployment
Designed for High Performance
10
+/-
L2 ~10 ns, memory ~100 ns, network <1ms, disk ~10ms
• Cache
• Region
• Member
• Client Cache
• Functions
• Listeners
11
Concepts
• Cache
• In-memory storage and
management for your data
• Configurable through XML, Spring,
Java API or CLI
• Collection of Region
12
Concepts
Region
Region
Region
Cache
JVM
• Region
• Distributed java.util.Map on steroids
(Key/Value)
• Consistent API regardless of where or
how data is stored
• Observable (reactive)
• Highly available, redundant on cache
Member (s).
13
Concepts
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
• Region
• Local, Replicated or Partitioned
• In-memory or persistent
• Redundant
• LRU
• Overflow
14
Concepts
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
LOCAL	
  
LOCAL_HEAP_LRU	
  
LOCAL_OVERFLOW	
  
LOCAL_PERSISTENT	
  
LOCAL_PERSISTENT_OVERFLOW	
  
PARTITION	
  
PARTITION_HEAP_LRU	
  
PARTITION_OVERFLOW	
  
PARTITION_PERSISTENT	
  
PARTITION_PERSISTENT_OVERFLOW	
  
PARTITION_PROXY	
  
PARTITION_PROXY_REDUNDANT	
  
PARTITION_REDUNDANT	
  
PARTITION_REDUNDANT_HEAP_LRU	
  
PARTITION_REDUNDANT_OVERFLOW	
  
PARTITION_REDUNDANT_PERSISTENT	
  
PARTITION_REDUNDANT_PERSISTENT_OVERFLOW	
  
REPLICATE	
  
REPLICATE_HEAP_LRU	
  
REPLICATE_OVERFLOW	
  
REPLICATE_PERSISTENT	
  
REPLICATE_PERSISTENT_OVERFLOW	
  
REPLICATE_PROXY
• Persistent Regions
• Durability
• WAL for efficient writing
• Consistent recovery
• Compaction
15
Concepts
Modify
k1->v5
Create
k6->v6
Create
k2->v2
Create
k4->v4
Oplog2.crf
Member
1
Modify
k4->v7Oplog3.crf
Put k4->v7
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Server 1 Server N
• Member
• A process that has a connection to the
system
• A process that has created a cache
• Embeddable within your application
16
Concepts
Client
Locator
Server
• Client cache
• A process connected to the Geode
server(s)
• Can have a local copy of the data
• Can be notified about events on the
servers
17
Concepts
Application
GemFire Server
Region
Region
RegionClient Cache
• Functions
• Used for distributed concurrent processing 

(Map/Reduce, stored procedure)
• Highly available
• Data oriented
• Member oriented
18
Concepts
Submit (f1)
f1 , f2 , … fn
Execute

Functions
19
Concepts
Server Server
FunctionService.onRegion.withFilter.execute
ResultCollector.getResult
Server Distributed System
execute
Server
Server
6
1
result
execute
execute
result
result
2
5
3
4
3 4
Server
Partitioned Region
Data Store - X
Partitioned Region
Data Store - Y
Partitioned Region
Data Store - Z
Partitioned Region
Data Accessor
Partitioned Region
Data Accessor
filter = Keys X, Y
Client Region
• Functions
• Listeners
• CacheWriter / CacheListener
• AsyncEventListener (queue / batch)
• Parallel or Serial
• Conflation
20
Concepts
Hands on
• Clone & Build
22
Hands-on: Build & run
git	
  clone	
  https://github.com/apache/incubator-­‐geode	
  
cd	
  incubator-­‐geode

./gradlew	
  build	
  -­‐Dskip.tests=true
• Start a server
cd	
  gemfire-­‐assembly/build/install/apache-­‐geode	
  	
  
./bin/gfsh	
  	
  
gfsh>	
  start	
  locator	
  -­‐-­‐name=locator	
  	
  
gfsh>	
  start	
  server	
  -­‐-­‐name=server	
  	
  
gfsh>	
  create	
  region	
  -­‐-­‐name=myRegion	
  -­‐-­‐type=REPLICATE
23
Hands-on: Docker
&
• Containers
• FreeBSD Jails (2000)
• Solaris Zones (2004)
• Docker (2013)
• Operating system level virtualization
• Isolated user space instances
24
* https://linuxcontainers.org/
Hands-on: Docker
25
Container vs VM
“..while the hypervisor abstracts the entire device, containers just
abstract the operating system kernel"
Hands-on: Docker & Compose
26
• Single instance
docker	
  run	
  -­‐it	
  apachegeode/geode:nightly	
  gfsh
• Cluster
docker-­‐compose	
  up
• Scale
docker-­‐compose	
  scale	
  server=3
Hands-on: Application
27
• Teeny URL
• Fast response time
• Statistics
• Hits
• User agent ?
• IPs ?
• URL will last for 5 minutes
• Distribute data & load
• Highly scalable
createURL
getURL
stats
• HDFS Persistence
• Off-heap memory storage
• Lucene Search
• Spark Integration
• Cloud Foundry service
28
Roadmap
• Code
• New features
• Bug fixes
• Writing tests
• Documentation
• Wiki
• Web site
• User guide
29
How to Contribute
• Community
• Join the mailing list
• Ask or answer
• Join our HipChat
• Become a speaker
• Finding bugs
• Testing an RC/Beta
• JIRA
https://issues.apache.org/jira/browse/GEODE
• Wiki
cwiki.apache.org/confluence/display/GEODE
• GitHub
https://github.com/apache/incubator-geode
• Mailing lists
mail-archives.apache.org/mod_mbox/incubator-geode-dev/
30
Links
31
Thank you
http://geode.incubator.apache.org
https://github.com/Pivotal-Open-Source-Hub

Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement

  • 1.
    Hands-on Introduction & HackathonKickoff Ashvin Agrawal William Markito @william_markito@aasoj Powered by
 Pivotal Open Source Hub (POSH) (incubating)
  • 2.
    • Hackathon Details •Apache Geode Introduction • History • Key features and components • Roadmap • Hands-on lab • Build & run • Starting a cluster • Using docker for clustering • Your first app • Q&A 2 Agenda
  • 3.
    Hackathon details Powered by
 PivotalOpen Source Hub (POSH) http://ambitious-apps.challengepost.com/
  • 4.
  • 5.
    A distributed, memory-baseddata management platform for data oriented apps that need: • high performance, scalability, resiliency and continuous availability • fast access to critical data set • location aware distributed data processing • event driven data architecture 5 Introduction
  • 6.
    6 One size fitsall ? Cost of sorting is nlog(n)
  • 7.
    7 One size fitsall ? Cost of sorting is nlog(n) • Data quality and quantity differences • Eventual consistency • Response time expectation • Scalability challenges: disk, memory, network and external systems
  • 8.
    • 1000+ systemsin production (real customers) • Cutting edge use cases 8 Incubating… but rock solid 2004 2008 2014 •  Massive increase in data volumes •  Falling margins per transaction •  Increasing cost of IT maintenance •  Need for elasticity in systems •  Financial Services Providers (every major Wall Street bank) •  Department of Defense •  Real Time response needs •  Time to market constraints •  Need for flexible data models across enterprise •  Distributed development •  Persistence + In-memory •  Global data visibility needs •  Fast Ingest needs for data •  Need to allow devices to hook into enterprise data •  Always on •  Largest travel Portal •  Airlines •  Trade clearing •  Online gambling •  Largest Telcos •  Large mfrers •  Largest Payroll processor •  Auto insurance giants •  Largest rail systems on earth
  • 9.
    • 17 billionrecords in memory • GE Power & Water's Remote Monitoring & Diagnostics Center • 3 TB operational data in-memory, 400 TB archived • China Railways • 4.6 Million transactions a day / 40K transactions a second • China Railways 9 Incubating… but rock solid
  • 10.
    • Performance optimizedpersistence • Configurable consistency • Elastic capacity • Latency minimizing distribution • Heterogenous deployment Designed for High Performance 10 +/- L2 ~10 ns, memory ~100 ns, network <1ms, disk ~10ms
  • 11.
    • Cache • Region •Member • Client Cache • Functions • Listeners 11 Concepts
  • 12.
    • Cache • In-memorystorage and management for your data • Configurable through XML, Spring, Java API or CLI • Collection of Region 12 Concepts Region Region Region Cache JVM
  • 13.
    • Region • Distributedjava.util.Map on steroids (Key/Value) • Consistent API regardless of where or how data is stored • Observable (reactive) • Highly available, redundant on cache Member (s). 13 Concepts Region Cache java.util.Map JVM Key Value K01 May K02 Tim
  • 14.
    • Region • Local,Replicated or Partitioned • In-memory or persistent • Redundant • LRU • Overflow 14 Concepts Region Cache java.util.Map JVM Key Value K01 May K02 Tim Region Cache java.util.Map JVM Key Value K01 May K02 Tim LOCAL   LOCAL_HEAP_LRU   LOCAL_OVERFLOW   LOCAL_PERSISTENT   LOCAL_PERSISTENT_OVERFLOW   PARTITION   PARTITION_HEAP_LRU   PARTITION_OVERFLOW   PARTITION_PERSISTENT   PARTITION_PERSISTENT_OVERFLOW   PARTITION_PROXY   PARTITION_PROXY_REDUNDANT   PARTITION_REDUNDANT   PARTITION_REDUNDANT_HEAP_LRU   PARTITION_REDUNDANT_OVERFLOW   PARTITION_REDUNDANT_PERSISTENT   PARTITION_REDUNDANT_PERSISTENT_OVERFLOW   REPLICATE   REPLICATE_HEAP_LRU   REPLICATE_OVERFLOW   REPLICATE_PERSISTENT   REPLICATE_PERSISTENT_OVERFLOW   REPLICATE_PROXY
  • 15.
    • Persistent Regions •Durability • WAL for efficient writing • Consistent recovery • Compaction 15 Concepts Modify k1->v5 Create k6->v6 Create k2->v2 Create k4->v4 Oplog2.crf Member 1 Modify k4->v7Oplog3.crf Put k4->v7 Region Cache java.util.Map JVM Key Value K01 May K02 Tim Region Cache java.util.Map JVM Key Value K01 May K02 Tim Server 1 Server N
  • 16.
    • Member • Aprocess that has a connection to the system • A process that has created a cache • Embeddable within your application 16 Concepts Client Locator Server
  • 17.
    • Client cache •A process connected to the Geode server(s) • Can have a local copy of the data • Can be notified about events on the servers 17 Concepts Application GemFire Server Region Region RegionClient Cache
  • 18.
    • Functions • Usedfor distributed concurrent processing 
 (Map/Reduce, stored procedure) • Highly available • Data oriented • Member oriented 18 Concepts Submit (f1) f1 , f2 , … fn Execute
 Functions
  • 19.
    19 Concepts Server Server FunctionService.onRegion.withFilter.execute ResultCollector.getResult Server DistributedSystem execute Server Server 6 1 result execute execute result result 2 5 3 4 3 4 Server Partitioned Region Data Store - X Partitioned Region Data Store - Y Partitioned Region Data Store - Z Partitioned Region Data Accessor Partitioned Region Data Accessor filter = Keys X, Y Client Region • Functions
  • 20.
    • Listeners • CacheWriter/ CacheListener • AsyncEventListener (queue / batch) • Parallel or Serial • Conflation 20 Concepts
  • 21.
  • 22.
    • Clone &Build 22 Hands-on: Build & run git  clone  https://github.com/apache/incubator-­‐geode   cd  incubator-­‐geode
 ./gradlew  build  -­‐Dskip.tests=true • Start a server cd  gemfire-­‐assembly/build/install/apache-­‐geode     ./bin/gfsh     gfsh>  start  locator  -­‐-­‐name=locator     gfsh>  start  server  -­‐-­‐name=server     gfsh>  create  region  -­‐-­‐name=myRegion  -­‐-­‐type=REPLICATE
  • 23.
  • 24.
    • Containers • FreeBSDJails (2000) • Solaris Zones (2004) • Docker (2013) • Operating system level virtualization • Isolated user space instances 24 * https://linuxcontainers.org/ Hands-on: Docker
  • 25.
    25 Container vs VM “..whilethe hypervisor abstracts the entire device, containers just abstract the operating system kernel"
  • 26.
    Hands-on: Docker &Compose 26 • Single instance docker  run  -­‐it  apachegeode/geode:nightly  gfsh • Cluster docker-­‐compose  up • Scale docker-­‐compose  scale  server=3
  • 27.
    Hands-on: Application 27 • TeenyURL • Fast response time • Statistics • Hits • User agent ? • IPs ? • URL will last for 5 minutes • Distribute data & load • Highly scalable createURL getURL stats
  • 28.
    • HDFS Persistence •Off-heap memory storage • Lucene Search • Spark Integration • Cloud Foundry service 28 Roadmap
  • 29.
    • Code • Newfeatures • Bug fixes • Writing tests • Documentation • Wiki • Web site • User guide 29 How to Contribute • Community • Join the mailing list • Ask or answer • Join our HipChat • Become a speaker • Finding bugs • Testing an RC/Beta
  • 30.
    • JIRA https://issues.apache.org/jira/browse/GEODE • Wiki cwiki.apache.org/confluence/display/GEODE •GitHub https://github.com/apache/incubator-geode • Mailing lists mail-archives.apache.org/mod_mbox/incubator-geode-dev/ 30 Links
  • 31.