SlideShare a Scribd company logo
1 of 2
Download to read offline
CASE STUDY
Intel®Distribution for Apache Hadoop* Software
Intel®Xeon®Processor E5-2630 v2 Product Family
Big Data Analytics
Education
Data in the desert
Intel worked with Ben-Gurion University of the Negev to set up a Big Data Analytics Lab, enabling in-
formation systems engineering students to better understand and develop complex machine-learning
algorithms. The Lab is one of the first in the world running Intel® Distribution for Apache Hadoop*
software together with Apache Spark*, on servers powered by the Intel® Xeon® processor E5-2630 v2
product family. By removing I/O bottlenecks with distributed RAM, Apache Spark offers much better
utilization of the Intel® processors. This project showcases the performance gains enabled through these
three elements working together.
Challenges
• Peak performance. The Department of Information Systems Engineering at Ben-Gurion University
wanted to roll out a more powerful computing cluster to enable its students to mine massive datasets
and work on research projects akin to those faced in industry
Solutions
• Big Data Analytics Lab. Implemented a powerful new computing cluster running Intel Distribution
for Apache Hadoop software together with Apache Spark on seven HP ProLiant* DL360p Gen 8
servers powered by the Intel Xeon processor E5-2630 v2 product family and running a Red Hat CentOS*
Impact
• Keeping pace. The Big Data Analytics Lab allows students to mine even larger datasets than before
and develop more complex and distributed algorithms, enabling them to continue to work with industry
in the resolution of real-world Web mining and cyber security problems
• New horizons. It has also made it possible for the university to run a masters course in mining mas-
sive datasets which focus on implementing distributed machine-learning algorithms. Together with
strong, continued ties with industry, this properly prepares students for careers in the real world and
helps the university to attract the best engineering students
Ben-Gurion University of the Negev opens Big Data Lab running Intel®Distribution for Apache Hadoop*
software on the Intel®Xeon®processor
“Intel® Distribution for Apache
Hadoop* software is optimized to
run on Intel® Xeon® processors.
This superior performance is
magnified with the addition of
Apache Spark*. By removing I/O
bottlenecks with distributed RAM,
Apache Spark secures even better
utilization of the Intel® processors.
The performance gains realized
through these three elements
working together are tremendous.”
Lior Rokach,
Professor,
Department of Information Systems
Engineering,
Ben-Gurion University
Ben-Gurion University
Ben-Gurion University was established in 1969
as the University of the Negev with the aim of
promoting the development of the Negev desert
that comprises more than 60 percent of Israel. The
University was inspired by the vision of Israel's
founder and first prime minister, David Ben-Gurion,
who believed that the future of the country lay
in this region. After Ben-Gurion's death in 1973,
the University was renamed Ben-Gurion University
of the Negev.
Today, Ben-Gurion University is a major center for
teaching and research with close to 20,000 stu-
dents. The Department of Information Systems
Engineering, which sits within the Faculty of En-
gineering, prepares students for careers in the
analysis, design, development, use and manage-
ment of information systems.
The department has an excellent research program
and strong ties with industry, which enables its
fourth-year undergraduate and postgraduate stu-
dents to work on real-world information system
problems. The department also excels in filing
patent applications. In 2011, more than 20 percent
of patents filed by Ben-Gurion University were from
students in the Information Systems Engineering
Department.
Big data analytics
Many of the students' research projects require an-
alyzing very large datasets, especially those focused
on Web mining and cyber security. The emphasis
is on developing novel, sound and theoretically-
motivated algorithms for accomplishing large-scale
tasks in data mining, such as high-dimensional
clustering, classification algorithms and anomaly
direction.
In the past, the research projects students were
able to undertake were limited by the department’s
computing resources. Generally, students had access
to just one server for their research, which meant
they were mostly only able to work on developing
non-scalable algorithms. If its research programs
were to keep pace with developments in the real
world, the department realized it needed to roll
out a more powerful computing cluster.
This document and the information given are for the convenience of Intel’s customer base and are provided “AS IS” WITH NO WARRANTIES WHATSOEVER, EXPRESS OR IMPLIED,
INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF INTELLECTUAL PROPERTY RIGHTS. Receipt or
possession of this document does not grant any license to any of the intellectual property described, displayed, or contained herein. Intel® products are not intended for use in
medical, lifesaving, life-sustaining, critical control, or safety systems, or in nuclear facility applications.
Intel does not control or audit the design or implementation of third party benchmark data or Web sites referenced in this document. Intel encourages all of its customers to visit
the referenced Web sites or others where similar performance benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect per-
formance of systems available for purchase.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark,
are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should
consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with
other products. For more information go to http://www.intel.com/performance.
Copyright
©2014, Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Distribution for Apache Hadoop software, Intel Xeon and Xeon inside are trademarks of Intel
Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others. 0314/JNW/RLC/XX/PDF 330300-001EN
Computing clusters
Intel has an experienced Advanced Analytics De-
partment with extensive knowledge of machine
learning and big data analytics platforms such as
Apache Hadoop and Apache Spark. Intel Advanced
Analytics delivered a presentation to the Univer-
sity explaining the benefits of Intel Distribution
for Apache Hadoop software in mining massive
datasets. The Apache Hadoop software library is
a framework that allows for the distributed pro-
cessing of large datasets across clusters of com-
puters using simple programming models.
The university immediately recognized the bene-
fits of Intel Distribution for Apache Hadoop soft-
ware for better understanding and developing
machine-learning algorithms. Machine learning is
a branch of artificial intelligence concerning the
construction and study of systems that can learn
from data.
The university and Intel then worked together to
roll out a seven-node computing cluster. Initially,
just the Department of Information Systems Engi-
neering was involved in the project. But when
the benefits of the cluster became apparent, other
departments stepped up to donate budget to ex-
tend the cluster into a shared resource. By pooling
resources, the university was able to roll out a
much more powerful computing cluster.
The Big Data Analytics Lab
The Big Data Analytics Lab is one of the first in
the world running Intel Distribution for Apache
Hadoop software 3.0.2, which is based on YARN*
together with Apache Spark 0.81, on seven HP
ProLiant DL360p Gen 8 servers powered by the
Intel Xeon processor E5-2630 v2 product family
and running a Red Hat CentOS. With a total capacity
of over 220TB of disk space, over 448GB RAM and
84 cores, it is considered one of the largest Hadoop
clusters operating in an Israeli academic institution.
Lior Rokach, professor in the Department of Infor-
mation Systems Engineering, said: “Intel Distribu-
tion for Apache Hadoop software is optimized to
run on Intel Xeon processors. This superior perform-
ance is magnified with the addition of Apache Spark.
By removing I/O bottlenecks with distributed RAM,
Apache Spark secures even better utilization of
the Intel processors. The performance gains real-
ized through these three elements working to-
gether are tremendous. Ongoing management
of the cluster is also relatively simple.”
Working with industry
The Big Data Analytics Lab is enabling research
students to mine even larger datasets. This means
they can better understand and develop more com-
plex machine-learning algorithms, enabling them
to continue to work with industry to resolve real-
world Web mining and cyber security problems.
For example, information systems engineering
students are working with several Internet service
providers (ISPs) to develop a machine-learning
algorithm to improve the page ranking system so
it is based on actual traffic to each site rather than
links within those sites. They are harnessing the
Hadoop cluster to mine over 100TB of data de-
livered to them by the ISPs.
The Big Data Analytics Lab has made it possible
for the university to run a masters course in min-
ing massive datasets, which counts towards the
Lessons Learned
Intel® Distribution for Apache Hadoop* software
is the only distribution built from silicon up to
enable with the widest range of data analysis
on Apache Hadoop. It is the first with hard-
ware-enhanced performance and security
capabilities. Optimized to run on Intel® Xeon®
processors, it offers great performance. Ad-
ding Apache Spark, which removes I/O bottlen-
ecks with distributed RAM, offers even better
processor utilization. Looking to the future,
Intel is committed to further developing a
platform on which the entire ecosystem can
build next-generation analytics solutions.
MSc degree with specialization in data mining and
business intelligence. Together with strong, con-
tinued ties with industry, this properly prepares
Ben-Gurion students for careers in the real world.
It also helps the university to attract the best en-
gineering students, since it is able to offer inter-
esting research projects that other universities
cannot.
Ben-Gurion University will announce the opening
of the Big Data Analytics Lab at the Second Annual
Data Mining for Business Intelligence Conference,
of which Intel is a sponsor.
Find the solution that’s right for your organiza-
tion. View success stories from your peers
(www.intel.co.uk/Itcasestudies), learn more
about server products for business
(www.intel.com/xeon) and check out the IT
Center, Intel’s resource for the IT Industry
(www.intel.co.uk/itcenter).
Enabling research students to mine even larger datasets

More Related Content

What's hot

Utilization of hpc in cancer research 2008 ric
Utilization of hpc in cancer research 2008 ricUtilization of hpc in cancer research 2008 ric
Utilization of hpc in cancer research 2008 ric
BIT002
 
resume v 5.0
resume v 5.0resume v 5.0
resume v 5.0
Ye Xu
 
Ingredients for Semantic Sensor Networks
Ingredients for Semantic Sensor NetworksIngredients for Semantic Sensor Networks
Ingredients for Semantic Sensor Networks
Oscar Corcho
 

What's hot (6)

Utilization of hpc in cancer research 2008 ric
Utilization of hpc in cancer research 2008 ricUtilization of hpc in cancer research 2008 ric
Utilization of hpc in cancer research 2008 ric
 
resume v 5.0
resume v 5.0resume v 5.0
resume v 5.0
 
APPLICATION WISE ANNOTATIONS ON INTELLIGENT DATABASE TECHNIQUES
APPLICATION WISE ANNOTATIONS ON INTELLIGENT DATABASE TECHNIQUESAPPLICATION WISE ANNOTATIONS ON INTELLIGENT DATABASE TECHNIQUES
APPLICATION WISE ANNOTATIONS ON INTELLIGENT DATABASE TECHNIQUES
 
Intel big data analytics in health and life sciences personalized medicine
Intel big data analytics in health and life sciences personalized medicineIntel big data analytics in health and life sciences personalized medicine
Intel big data analytics in health and life sciences personalized medicine
 
Past, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software DataPast, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software Data
 
Ingredients for Semantic Sensor Networks
Ingredients for Semantic Sensor NetworksIngredients for Semantic Sensor Networks
Ingredients for Semantic Sensor Networks
 

Viewers also liked

Android keyboard & layout
Android keyboard & layoutAndroid keyboard & layout
Android keyboard & layout
운용 최
 
Medicion de-estres-diapositivas
Medicion de-estres-diapositivasMedicion de-estres-diapositivas
Medicion de-estres-diapositivas
jlehorreta33
 
P6.katep
P6.katepP6.katep
P6.katep
rm6wc
 
Mind Bogglers presentation 'quiz for kids'
Mind Bogglers  presentation 'quiz for kids'Mind Bogglers  presentation 'quiz for kids'
Mind Bogglers presentation 'quiz for kids'
HillsideHighSchool
 
Voice Recognition Wireless Home Automation System Based On Zigbee
Voice Recognition Wireless Home Automation System Based On ZigbeeVoice Recognition Wireless Home Automation System Based On Zigbee
Voice Recognition Wireless Home Automation System Based On Zigbee
IOSR Journals
 
Aqua Brands introductin
Aqua Brands introductinAqua Brands introductin
Aqua Brands introductin
Waseem Zataar
 

Viewers also liked (13)

Android keyboard & layout
Android keyboard & layoutAndroid keyboard & layout
Android keyboard & layout
 
Medicion de-estres-diapositivas
Medicion de-estres-diapositivasMedicion de-estres-diapositivas
Medicion de-estres-diapositivas
 
Inforln.com ERP LN 10.3 & 10.4 Configurable Contract Deliverables Enhancements
Inforln.com ERP LN 10.3 & 10.4 Configurable Contract Deliverables EnhancementsInforln.com ERP LN 10.3 & 10.4 Configurable Contract Deliverables Enhancements
Inforln.com ERP LN 10.3 & 10.4 Configurable Contract Deliverables Enhancements
 
2011 ch 2
2011 ch 22011 ch 2
2011 ch 2
 
U17's plans
U17's plansU17's plans
U17's plans
 
Quang cao pho dong hoa sen
Quang cao pho dong hoa senQuang cao pho dong hoa sen
Quang cao pho dong hoa sen
 
VNSISPL_DBMS_Concepts_ch14
VNSISPL_DBMS_Concepts_ch14VNSISPL_DBMS_Concepts_ch14
VNSISPL_DBMS_Concepts_ch14
 
Estratègies per fer créixer el teu negoci a Turquia
Estratègies per fer créixer el teu negoci a TurquiaEstratègies per fer créixer el teu negoci a Turquia
Estratègies per fer créixer el teu negoci a Turquia
 
HCLT Whitepaper: Thermal Design and Management of Servers
HCLT Whitepaper: Thermal Design and Management of ServersHCLT Whitepaper: Thermal Design and Management of Servers
HCLT Whitepaper: Thermal Design and Management of Servers
 
P6.katep
P6.katepP6.katep
P6.katep
 
Mind Bogglers presentation 'quiz for kids'
Mind Bogglers  presentation 'quiz for kids'Mind Bogglers  presentation 'quiz for kids'
Mind Bogglers presentation 'quiz for kids'
 
Voice Recognition Wireless Home Automation System Based On Zigbee
Voice Recognition Wireless Home Automation System Based On ZigbeeVoice Recognition Wireless Home Automation System Based On Zigbee
Voice Recognition Wireless Home Automation System Based On Zigbee
 
Aqua Brands introductin
Aqua Brands introductinAqua Brands introductin
Aqua Brands introductin
 

Similar to Ben gurion university_data_desert

Cloudwatt pioneers big_data
Cloudwatt pioneers big_dataCloudwatt pioneers big_data
Cloudwatt pioneers big_data
xband
 
Memory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective ViewMemory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective View
ijtsrd
 
Tony Reid Resume
Tony Reid ResumeTony Reid Resume
Tony Reid Resume
storyhome
 
SiddharthaMitra_resume_pdf
SiddharthaMitra_resume_pdfSiddharthaMitra_resume_pdf
SiddharthaMitra_resume_pdf
Siddhartha Mitra
 

Similar to Ben gurion university_data_desert (20)

2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4
 
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceIntroduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
 
Cloudwatt pioneers big_data
Cloudwatt pioneers big_dataCloudwatt pioneers big_data
Cloudwatt pioneers big_data
 
Gurney · SlidesCarnival.pptx
Gurney · SlidesCarnival.pptxGurney · SlidesCarnival.pptx
Gurney · SlidesCarnival.pptx
 
Memory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective ViewMemory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective View
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
OA centre of excellence
OA centre of excellenceOA centre of excellence
OA centre of excellence
 
i_Venkata_Sai_Manoj_Illendula_Resume
i_Venkata_Sai_Manoj_Illendula_Resumei_Venkata_Sai_Manoj_Illendula_Resume
i_Venkata_Sai_Manoj_Illendula_Resume
 
RESUME_RAVI
RESUME_RAVIRESUME_RAVI
RESUME_RAVI
 
Special Purpose IBM Center of excellence lab
Special Purpose IBM Center of excellence lab Special Purpose IBM Center of excellence lab
Special Purpose IBM Center of excellence lab
 
IBM and Apache Spark
IBM and Apache SparkIBM and Apache Spark
IBM and Apache Spark
 
PPT5: Neuron Introduction
PPT5: Neuron IntroductionPPT5: Neuron Introduction
PPT5: Neuron Introduction
 
Tony Reid Resume
Tony Reid ResumeTony Reid Resume
Tony Reid Resume
 
IRJET- A Workflow Management System for Scalable Data Mining on Clouds
IRJET- A Workflow Management System for Scalable Data Mining on CloudsIRJET- A Workflow Management System for Scalable Data Mining on Clouds
IRJET- A Workflow Management System for Scalable Data Mining on Clouds
 
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
 
Implementing load balancing algorithm in middleware system of volunteer cloud...
Implementing load balancing algorithm in middleware system of volunteer cloud...Implementing load balancing algorithm in middleware system of volunteer cloud...
Implementing load balancing algorithm in middleware system of volunteer cloud...
 
Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee Projeect
 
My Curriculum Vitae
My Curriculum VitaeMy Curriculum Vitae
My Curriculum Vitae
 
SiddharthaMitra_resume_pdf
SiddharthaMitra_resume_pdfSiddharthaMitra_resume_pdf
SiddharthaMitra_resume_pdf
 
BigData_Krishna Kumar Sharma
BigData_Krishna Kumar SharmaBigData_Krishna Kumar Sharma
BigData_Krishna Kumar Sharma
 

More from xband

Charles la trobe_college_learning_without_limits
Charles la trobe_college_learning_without_limitsCharles la trobe_college_learning_without_limits
Charles la trobe_college_learning_without_limits
xband
 
Fujitsu spain revolutionizing_public_administration
Fujitsu spain revolutionizing_public_administrationFujitsu spain revolutionizing_public_administration
Fujitsu spain revolutionizing_public_administration
xband
 

More from xband (20)

Talos threat-intelligence
Talos threat-intelligenceTalos threat-intelligence
Talos threat-intelligence
 
Preventing Data Breaches
Preventing Data BreachesPreventing Data Breaches
Preventing Data Breaches
 
Data Center Server security
Data Center Server securityData Center Server security
Data Center Server security
 
Complete Endpoint protection
Complete Endpoint protectionComplete Endpoint protection
Complete Endpoint protection
 
Advanced Threat Defense Intel Security
Advanced Threat Defense  Intel SecurityAdvanced Threat Defense  Intel Security
Advanced Threat Defense Intel Security
 
Security Transformation Services
Security Transformation ServicesSecurity Transformation Services
Security Transformation Services
 
Security Operations and Response
Security Operations and ResponseSecurity Operations and Response
Security Operations and Response
 
Information Risk and Protection
Information Risk and ProtectionInformation Risk and Protection
Information Risk and Protection
 
IBM Security Strategy Overview
IBM Security Strategy OverviewIBM Security Strategy Overview
IBM Security Strategy Overview
 
API Connect Presentation
API Connect PresentationAPI Connect Presentation
API Connect Presentation
 
Verizon Data Breach Investigation Report
Verizon Data Breach Investigation ReportVerizon Data Breach Investigation Report
Verizon Data Breach Investigation Report
 
Big Fix Q-Radar Ahmed Sharaf - EmbeddedSecurity.net
Big Fix Q-Radar Ahmed Sharaf - EmbeddedSecurity.netBig Fix Q-Radar Ahmed Sharaf - EmbeddedSecurity.net
Big Fix Q-Radar Ahmed Sharaf - EmbeddedSecurity.net
 
Bridging the Data Security Gap
Bridging the Data Security GapBridging the Data Security Gap
Bridging the Data Security Gap
 
Hipaa Omnibus Final-Rule-eResource
Hipaa Omnibus Final-Rule-eResourceHipaa Omnibus Final-Rule-eResource
Hipaa Omnibus Final-Rule-eResource
 
The Total Economic Impact™ Of Cisco Data Virtualization
The Total Economic Impact™ Of Cisco Data VirtualizationThe Total Economic Impact™ Of Cisco Data Virtualization
The Total Economic Impact™ Of Cisco Data Virtualization
 
Assessing the Business Value of SDN Datacenter Security Solutions
Assessing the Business Value of SDN Datacenter Security SolutionsAssessing the Business Value of SDN Datacenter Security Solutions
Assessing the Business Value of SDN Datacenter Security Solutions
 
Big Data, Little Data, and Everything in Between
Big Data, Little Data, and Everything in BetweenBig Data, Little Data, and Everything in Between
Big Data, Little Data, and Everything in Between
 
2015 cost of data breach study global analysis
2015 cost of data breach study global analysis2015 cost of data breach study global analysis
2015 cost of data breach study global analysis
 
Charles la trobe_college_learning_without_limits
Charles la trobe_college_learning_without_limitsCharles la trobe_college_learning_without_limits
Charles la trobe_college_learning_without_limits
 
Fujitsu spain revolutionizing_public_administration
Fujitsu spain revolutionizing_public_administrationFujitsu spain revolutionizing_public_administration
Fujitsu spain revolutionizing_public_administration
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Ben gurion university_data_desert

  • 1. CASE STUDY Intel®Distribution for Apache Hadoop* Software Intel®Xeon®Processor E5-2630 v2 Product Family Big Data Analytics Education Data in the desert Intel worked with Ben-Gurion University of the Negev to set up a Big Data Analytics Lab, enabling in- formation systems engineering students to better understand and develop complex machine-learning algorithms. The Lab is one of the first in the world running Intel® Distribution for Apache Hadoop* software together with Apache Spark*, on servers powered by the Intel® Xeon® processor E5-2630 v2 product family. By removing I/O bottlenecks with distributed RAM, Apache Spark offers much better utilization of the Intel® processors. This project showcases the performance gains enabled through these three elements working together. Challenges • Peak performance. The Department of Information Systems Engineering at Ben-Gurion University wanted to roll out a more powerful computing cluster to enable its students to mine massive datasets and work on research projects akin to those faced in industry Solutions • Big Data Analytics Lab. Implemented a powerful new computing cluster running Intel Distribution for Apache Hadoop software together with Apache Spark on seven HP ProLiant* DL360p Gen 8 servers powered by the Intel Xeon processor E5-2630 v2 product family and running a Red Hat CentOS* Impact • Keeping pace. The Big Data Analytics Lab allows students to mine even larger datasets than before and develop more complex and distributed algorithms, enabling them to continue to work with industry in the resolution of real-world Web mining and cyber security problems • New horizons. It has also made it possible for the university to run a masters course in mining mas- sive datasets which focus on implementing distributed machine-learning algorithms. Together with strong, continued ties with industry, this properly prepares students for careers in the real world and helps the university to attract the best engineering students Ben-Gurion University of the Negev opens Big Data Lab running Intel®Distribution for Apache Hadoop* software on the Intel®Xeon®processor “Intel® Distribution for Apache Hadoop* software is optimized to run on Intel® Xeon® processors. This superior performance is magnified with the addition of Apache Spark*. By removing I/O bottlenecks with distributed RAM, Apache Spark secures even better utilization of the Intel® processors. The performance gains realized through these three elements working together are tremendous.” Lior Rokach, Professor, Department of Information Systems Engineering, Ben-Gurion University Ben-Gurion University Ben-Gurion University was established in 1969 as the University of the Negev with the aim of promoting the development of the Negev desert that comprises more than 60 percent of Israel. The University was inspired by the vision of Israel's founder and first prime minister, David Ben-Gurion, who believed that the future of the country lay in this region. After Ben-Gurion's death in 1973, the University was renamed Ben-Gurion University of the Negev. Today, Ben-Gurion University is a major center for teaching and research with close to 20,000 stu- dents. The Department of Information Systems Engineering, which sits within the Faculty of En- gineering, prepares students for careers in the analysis, design, development, use and manage- ment of information systems. The department has an excellent research program and strong ties with industry, which enables its fourth-year undergraduate and postgraduate stu- dents to work on real-world information system problems. The department also excels in filing patent applications. In 2011, more than 20 percent of patents filed by Ben-Gurion University were from students in the Information Systems Engineering Department. Big data analytics Many of the students' research projects require an- alyzing very large datasets, especially those focused on Web mining and cyber security. The emphasis is on developing novel, sound and theoretically- motivated algorithms for accomplishing large-scale tasks in data mining, such as high-dimensional clustering, classification algorithms and anomaly direction. In the past, the research projects students were able to undertake were limited by the department’s computing resources. Generally, students had access to just one server for their research, which meant they were mostly only able to work on developing non-scalable algorithms. If its research programs were to keep pace with developments in the real world, the department realized it needed to roll out a more powerful computing cluster.
  • 2. This document and the information given are for the convenience of Intel’s customer base and are provided “AS IS” WITH NO WARRANTIES WHATSOEVER, EXPRESS OR IMPLIED, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF INTELLECTUAL PROPERTY RIGHTS. Receipt or possession of this document does not grant any license to any of the intellectual property described, displayed, or contained herein. Intel® products are not intended for use in medical, lifesaving, life-sustaining, critical control, or safety systems, or in nuclear facility applications. Intel does not control or audit the design or implementation of third party benchmark data or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect per- formance of systems available for purchase. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance. Copyright ©2014, Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Distribution for Apache Hadoop software, Intel Xeon and Xeon inside are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. 0314/JNW/RLC/XX/PDF 330300-001EN Computing clusters Intel has an experienced Advanced Analytics De- partment with extensive knowledge of machine learning and big data analytics platforms such as Apache Hadoop and Apache Spark. Intel Advanced Analytics delivered a presentation to the Univer- sity explaining the benefits of Intel Distribution for Apache Hadoop software in mining massive datasets. The Apache Hadoop software library is a framework that allows for the distributed pro- cessing of large datasets across clusters of com- puters using simple programming models. The university immediately recognized the bene- fits of Intel Distribution for Apache Hadoop soft- ware for better understanding and developing machine-learning algorithms. Machine learning is a branch of artificial intelligence concerning the construction and study of systems that can learn from data. The university and Intel then worked together to roll out a seven-node computing cluster. Initially, just the Department of Information Systems Engi- neering was involved in the project. But when the benefits of the cluster became apparent, other departments stepped up to donate budget to ex- tend the cluster into a shared resource. By pooling resources, the university was able to roll out a much more powerful computing cluster. The Big Data Analytics Lab The Big Data Analytics Lab is one of the first in the world running Intel Distribution for Apache Hadoop software 3.0.2, which is based on YARN* together with Apache Spark 0.81, on seven HP ProLiant DL360p Gen 8 servers powered by the Intel Xeon processor E5-2630 v2 product family and running a Red Hat CentOS. With a total capacity of over 220TB of disk space, over 448GB RAM and 84 cores, it is considered one of the largest Hadoop clusters operating in an Israeli academic institution. Lior Rokach, professor in the Department of Infor- mation Systems Engineering, said: “Intel Distribu- tion for Apache Hadoop software is optimized to run on Intel Xeon processors. This superior perform- ance is magnified with the addition of Apache Spark. By removing I/O bottlenecks with distributed RAM, Apache Spark secures even better utilization of the Intel processors. The performance gains real- ized through these three elements working to- gether are tremendous. Ongoing management of the cluster is also relatively simple.” Working with industry The Big Data Analytics Lab is enabling research students to mine even larger datasets. This means they can better understand and develop more com- plex machine-learning algorithms, enabling them to continue to work with industry to resolve real- world Web mining and cyber security problems. For example, information systems engineering students are working with several Internet service providers (ISPs) to develop a machine-learning algorithm to improve the page ranking system so it is based on actual traffic to each site rather than links within those sites. They are harnessing the Hadoop cluster to mine over 100TB of data de- livered to them by the ISPs. The Big Data Analytics Lab has made it possible for the university to run a masters course in min- ing massive datasets, which counts towards the Lessons Learned Intel® Distribution for Apache Hadoop* software is the only distribution built from silicon up to enable with the widest range of data analysis on Apache Hadoop. It is the first with hard- ware-enhanced performance and security capabilities. Optimized to run on Intel® Xeon® processors, it offers great performance. Ad- ding Apache Spark, which removes I/O bottlen- ecks with distributed RAM, offers even better processor utilization. Looking to the future, Intel is committed to further developing a platform on which the entire ecosystem can build next-generation analytics solutions. MSc degree with specialization in data mining and business intelligence. Together with strong, con- tinued ties with industry, this properly prepares Ben-Gurion students for careers in the real world. It also helps the university to attract the best en- gineering students, since it is able to offer inter- esting research projects that other universities cannot. Ben-Gurion University will announce the opening of the Big Data Analytics Lab at the Second Annual Data Mining for Business Intelligence Conference, of which Intel is a sponsor. Find the solution that’s right for your organiza- tion. View success stories from your peers (www.intel.co.uk/Itcasestudies), learn more about server products for business (www.intel.com/xeon) and check out the IT Center, Intel’s resource for the IT Industry (www.intel.co.uk/itcenter). Enabling research students to mine even larger datasets