Intel labwork - Bizosys Technologies

  • 3,321 views
Uploaded on

Big Join in Hadoop 3 million positions x 5000 risk models Each model consists 2M products.

Big Join in Hadoop 3 million positions x 5000 risk models Each model consists 2M products.

More in: Technology , Sports
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
3,321
On Slideshare
0
From Embeds
0
Number of Embeds
11

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. ©2013 BIZOSYS TECHNOLOGIES PRIVATE LIMITED HSearch @ Intel Innovation Lab
  • 2. © 2012 Bizosys Technologies Pvt Ltd. The Use-case
  • 3. Assessing Market Risk of an Investment Portfolio involving 15 billion calculations
  • 4. Big Join in Hadoop 3 million positions x 5000 risk models Each model consists 2M products. To achieve =>6 months of historical data to be readily available while calculating risk. Current Status=> Only 5 days of prior data is immediately available, rest in archives.
  • 5. Business Benefit • Allows more broader time based risk assessment • Solution avoids costly architectures such as in memory JVM cache based computing.
  • 6. © 2012 Bizosys Technologies Pvt Ltd. The Cluster
  • 7. © 2012 Bizosys Technologies Pvt Ltd. Sl. No. Descriptio n Machine Name Machine 1 Machine 2 Machine 3 Machine 4 1 Platform S4600SDP S4600SDP S4600SDP S4600SDP 2 Processor Details Xeon E5-4650, 2.7 GHz Xeon E5-4650, 2.7 GHz Xeon E5-4650, 2.7 GHz Xeon E5-4650, 2.7 GHz 20M L3 cache, 8 Core 20M L3 cache, 8 Core 20M L3 cache, 8 Core 20M L3 cache, 8 Core 4 Memory 16 x 8GB-PC3L- 10600R 16 x 8GB-PC3L- 10600R 16 x 8GB-PC3L- 10600R 16 x 8GB-PC3L- 10600R 5 Hard disk 300GB SAS 300GB SAS 300GB SAS 300GB SAS 6 250GB SSD 250GB SSD 250GB SSD 238.5GB SSD (4 x 60GB SSD in LVM) 7 OS Details Redhat Enterprise Linux 6.3 x64 Redhat Enterprise Linux 6.3 x64 Redhat Enterprise Linux 6.3 x64 Redhat Enterprise Linux 6.3 x64 /boot = 1GB /boot = 1GB /boot = 1GB /boot = 1GB swap = 32GB swap = 32GB swap = 32GB swap = 32GB /root = 100GB /root = 100GB /root = 100GB /root = 100GB /data = 167GB /data = 167GB /data = 167GB /data = 167GB /ssd = 250GB /ssd = 250GB /ssd = 250GB /ssd = 238.5GB Infrastructure - Metals
  • 8. © 2012 Bizosys Technologies Pvt Ltd. 1 Hadoop Hadoop 1.2 2 HSearch HSearch 0.94.4.41 3 JDK JDK 1.6.0_45 4 HDFS JDK Memory 4 GB 5 Hsearch JDK Memory 4 GB Hadoop
  • 9. © 2012 Bizosys Technologies Pvt Ltd. Learning
  • 10. First Run: 120Sec (No-Cache), 98Sec(Cache) Setup 1250 Models / Machine with 1 SSD /Machine. 1 Hsearch instance/machine and max 64 threads/instance Results 120 Sec with OS Cache Disabled. 98 Sec with OS Cache Enabled. Observation High I/O wait and Low CPU usage. Software bottleneck with sequential I/O reads. Action Taken
  • 11. Second Run: 115Sec (No-Cache), 90Sec(Cache) Setup 1250 Models / Machine with 1 SSD /Machine. 1 Hsearch instance/machine and max 64 threads/instance Results 115 Sec with OS Cache Disabled. 90 Sec with OS Cache Enabled. Observation After app log analysis we found DFSClient bottleneck. Action Taken Introduced 2 HSearch Instances/Machines
  • 12. Third Run: 70 Sec (No-Cache), 34 Sec(Cache) Setup 1250 Models / Machine with 1 SSD /Machine. 2 Hsearch instance/machine and max 32 threads/instance Results 70 Sec with OS Cache Disabled. 33.8 Sec with OS Cache Enabled. Observation (No Cache) Average CPU Usage 32%, max 43%, Avg interrupt 17245 and avg context switch 6365 and avg I/O wait 9.16.
  • 13. Fourth Run: 32.8Sec (No-Cache) 30.3Sec(Cache) Setup 1250 Models / Machine with 2 instance/machine. 4 SSDs/Machine. Max 32 threads / instance 40ms Delay on parallel thread launch Results 32.8 Sec with OS Cache disabled. 30.3 Sec on cache enabled. Observation (No Cache) Average CPU Usage 75%, max 97%, Avg interrupt 48921 and avg context switch 23376 and avg I/O wait 2.5. Action Taken More Delay is introduced to reduce contention.
  • 14. Fifth Run: 32.5 Sec (No-Cache), 32Sec(Cache) Setup 1250 Models / Machine with 2 instance/machine. 4 SSDs/Machine. Max 32 threads / instance 45ms Delay on parallel thread launch Results 32.504 Sec with OS Cache Disabled. 32.060 Sec with OS Cache Enabled. Observation (No Cache) Average CPU Usage 55%, max 82%, Avg interrupt 37564 and avg context switch 9419 and avg I/O wait 1.0. Action Taken None