This document summarizes the roles of servers in a Hadoop cluster, including manager, name nodes, edge nodes, and data nodes. It discusses hardware considerations for Hadoop cluster design like CPU to memory to disk ratios for different use cases. It also provides an overview of Dell's Hadoop solutions that integrate PowerEdge servers, Dell Networking switches, and support from Etu for analytic software and Dell Professional Services for implementation. It briefly discusses futures around in-memory processing and virtualized Hadoop deployments.
View this presentation to gain insight into optimizing Postgres and savings for your data management. Visit EntepriseDB's > Resources > Webcasts to view the presentation by Jay Barrows, VP of Field Operations.
During this 45 -minute presentation, Jay Barrows, VP of Field Operations, will provide a business review of how, where and why businesses are leveraging PostgreSQL. In addition, he will go over the primary pains and business drivers shaping the data management landscape such as significant cost pressures combined with recent improvements to open source database options. Oracle migration is often considered the most powerful cost reduction opportunity if you understand the migration risks, and have a clear migration game plan.
Jay will discuss several use cases selected that highlight how enterprise customers are leveraging their findings from the adoption of other OSS products, to helping to bring Postgres to the extremely expensive and mission critical part of their IT stack - the DB. By doing so they are driving TCO down in very meaningful ways, sacrificing nothing in terms of performance, scalability, security or reliability. Many businesses are already leveraging OSS in much lower cost parts of IT stack (OS, middleware).
This presentation will be beneficial to decision-makers interested in enhancing their data management with PostgreSQL. I
Big data processing meets non-volatile memory: opportunities and challenges DataWorks Summit
Advanced big data processing frameworks have been proposed to harness the fast data transmission capability of remote direct memory access (RDMA) over InfiniBand and RoCE. However, with the introduction of the non-volatile memory (NVM), these designs along with the default execution models, like MapReduce and Directed Acyclic Graph (DAG), need to be re-assessed to discover the possibilities of further enhanced performance.
In this context, we propose an accelerated execution framework (NVMD) for MapReduce and DAG that leverages the benefits of NVM and RDMA. NVMD introduces novel features for MapReduce and DAG, such as a hybrid push and pull shuffle mechanism and dynamic adaptation to the network congestion. The design has been incorporated into Apache Hadoop and Tez. Performance results illustrate that NVMD can achieve up to 3.65x and 3.18x improvement for Hadoop and Tez, respectively. In this talk, we will also present NVM-aware HDFS design and its benefits for MapReduce, Spark, and HBase.
Speaker: Shashank Gugnani, PhD Student, Ohio State University
View this presentation to gain insight into optimizing Postgres and savings for your data management. Visit EntepriseDB's > Resources > Webcasts to view the presentation by Jay Barrows, VP of Field Operations.
During this 45 -minute presentation, Jay Barrows, VP of Field Operations, will provide a business review of how, where and why businesses are leveraging PostgreSQL. In addition, he will go over the primary pains and business drivers shaping the data management landscape such as significant cost pressures combined with recent improvements to open source database options. Oracle migration is often considered the most powerful cost reduction opportunity if you understand the migration risks, and have a clear migration game plan.
Jay will discuss several use cases selected that highlight how enterprise customers are leveraging their findings from the adoption of other OSS products, to helping to bring Postgres to the extremely expensive and mission critical part of their IT stack - the DB. By doing so they are driving TCO down in very meaningful ways, sacrificing nothing in terms of performance, scalability, security or reliability. Many businesses are already leveraging OSS in much lower cost parts of IT stack (OS, middleware).
This presentation will be beneficial to decision-makers interested in enhancing their data management with PostgreSQL. I
Big data processing meets non-volatile memory: opportunities and challenges DataWorks Summit
Advanced big data processing frameworks have been proposed to harness the fast data transmission capability of remote direct memory access (RDMA) over InfiniBand and RoCE. However, with the introduction of the non-volatile memory (NVM), these designs along with the default execution models, like MapReduce and Directed Acyclic Graph (DAG), need to be re-assessed to discover the possibilities of further enhanced performance.
In this context, we propose an accelerated execution framework (NVMD) for MapReduce and DAG that leverages the benefits of NVM and RDMA. NVMD introduces novel features for MapReduce and DAG, such as a hybrid push and pull shuffle mechanism and dynamic adaptation to the network congestion. The design has been incorporated into Apache Hadoop and Tez. Performance results illustrate that NVMD can achieve up to 3.65x and 3.18x improvement for Hadoop and Tez, respectively. In this talk, we will also present NVM-aware HDFS design and its benefits for MapReduce, Spark, and HBase.
Speaker: Shashank Gugnani, PhD Student, Ohio State University
The talk will be about the project to find a replacement for all IBM products in the company with the example for the databases. What was the goal of the project, the learning, a short overview about the options
we migrated about 500 db2 databases to EnterpriseDB. The database size was from a small size up to 4 TB and we implemented a completely new fully automated deployment of VM and database. Databases are now 11 month in production. The talk will have an overview of the project, the learnings, a few parameters and technical parameters that were found for stability and performance.
Are you facing the challenge to meet growing IT requirements while operating on a limited budget?
Learn more about why you should transform your database management system (DBMS) and make open source part of your strategic business and IT choices. An open source DBMS offers you various benefits, including cost reduction, liberation from vendor lock-in, and a large development community. Paired with enterprise-class services, 24x7 support and reliable management tools, open source is a first class alternative to traditional proprietary DBMSs.
Open Source Software on OpenPOWER systems.
With 100% open source system software (including the firmware), OpenPOWER is the most open server architecture in the market. Based on the IBM POWER8 chip, this new family of servers featuring the latest Nvidia NVLink technology runs all the software solutions presented at OPEN'16 with significant cost advantages. This session explains how Docker, EnterpriseDB and many others benefit from this advanced design, and how 200+ technology companies including Google and RackSpace are collaborating in an open development alliance to build the datacenter of the future.
IBM Power9 Servers are here! Launched this week, the AC922 POWER9 servers will form the basis of the world’s fastest “Coral” supercomputers coming to ORNL and LLNL. Built specifically for compute-intensive AI workloads, the new POWER9 systems are capable of improving the training times of deep learning frameworks by nearly 4x allowing enterprises to build more accurate AI applications, faster.
Listen to the Radio Free HPC podcast on Power9: https://insidehpc.com/2017/12/radio-free-hpc-looks-new-power9-titan-v-snapdragon-845/
Learn more: https://www.ibm.com/us-en/marketplace/power-systems-ac922
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this webinar, we will discuss different open-source models and different ways open source communities are organized. Understanding these key concepts is essential when selecting a strategic open-source platform. We will explore how the PostgreSQL community ensures that it stays independent, remains vibrant, drives innovation, and provides a reliable long-term platform for strategic IT projects.
Best Practices & Lessons Learned from Deployment of PostgreSQLEDB
This talk will review best practices and lessons learned from working with large and mid-size companies on their deployment of PostgreSQL. We will explore the practices that helped industry leaders move through the stages of PostgreSQL adoption and get as much value out of their deployment as possible without incurring undue risk.
Bare-metal performance for Big Data workloads on Docker containersBlueData, Inc.
In a benchmark study, Intel® compared the performance of Big Data workloads running on a bare-metal deployment versus running in Docker* containers with the BlueData® EPIC™ software platform.
This in-depth study shows that performance ratios for container-based Hadoop workloads on BlueData EPIC are equal to — and in some cases, better than — bare-metal Hadoop. For example, benchmark tests showed that the BlueData EPIC platform demonstrated an average 2.33% performance gain over bare metal, for a configuration with 50 Hadoop compute nodes and 10 terabytes (TB) of data. These performance results were achieved without any modifications to the Hadoop software.
This is a revolutionary milestone, and the result of an ongoing collaboration between Intel and BlueData software engineering teams.
This white paper describes the software and hardware configurations for the benchmark tests, as well as details of the performance benchmark process and results.
Red Hat's Ross Turk took the podium at the Public Sector Red Hat Storage Days on 1/20/16 and 1/21/16 to explain just why software-defined storage matters.
How to use postgresql.conf to configure and tune the PostgreSQL serverEDB
Tuning your PostgreSQL server plays an important role in making sure you get the most out of your server resources, and running with default parameters is not always enough. Using the PostgreSQL server configuration file postgresql.conf, we can tune the right area and make the most out of the server resources. The postgresql.conf file tuning parameters are classified into different categories including database connections, memory, optimizers, and logging.
In this webinar, you will learn:
- A basic understanding of postgresql.conf
- The categories and parameters of postgresql.conf
- How to adjust parameters
- Expert tuning recommendations
The talk will be about the project to find a replacement for all IBM products in the company with the example for the databases. What was the goal of the project, the learning, a short overview about the options
we migrated about 500 db2 databases to EnterpriseDB. The database size was from a small size up to 4 TB and we implemented a completely new fully automated deployment of VM and database. Databases are now 11 month in production. The talk will have an overview of the project, the learnings, a few parameters and technical parameters that were found for stability and performance.
Are you facing the challenge to meet growing IT requirements while operating on a limited budget?
Learn more about why you should transform your database management system (DBMS) and make open source part of your strategic business and IT choices. An open source DBMS offers you various benefits, including cost reduction, liberation from vendor lock-in, and a large development community. Paired with enterprise-class services, 24x7 support and reliable management tools, open source is a first class alternative to traditional proprietary DBMSs.
Open Source Software on OpenPOWER systems.
With 100% open source system software (including the firmware), OpenPOWER is the most open server architecture in the market. Based on the IBM POWER8 chip, this new family of servers featuring the latest Nvidia NVLink technology runs all the software solutions presented at OPEN'16 with significant cost advantages. This session explains how Docker, EnterpriseDB and many others benefit from this advanced design, and how 200+ technology companies including Google and RackSpace are collaborating in an open development alliance to build the datacenter of the future.
IBM Power9 Servers are here! Launched this week, the AC922 POWER9 servers will form the basis of the world’s fastest “Coral” supercomputers coming to ORNL and LLNL. Built specifically for compute-intensive AI workloads, the new POWER9 systems are capable of improving the training times of deep learning frameworks by nearly 4x allowing enterprises to build more accurate AI applications, faster.
Listen to the Radio Free HPC podcast on Power9: https://insidehpc.com/2017/12/radio-free-hpc-looks-new-power9-titan-v-snapdragon-845/
Learn more: https://www.ibm.com/us-en/marketplace/power-systems-ac922
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this webinar, we will discuss different open-source models and different ways open source communities are organized. Understanding these key concepts is essential when selecting a strategic open-source platform. We will explore how the PostgreSQL community ensures that it stays independent, remains vibrant, drives innovation, and provides a reliable long-term platform for strategic IT projects.
Best Practices & Lessons Learned from Deployment of PostgreSQLEDB
This talk will review best practices and lessons learned from working with large and mid-size companies on their deployment of PostgreSQL. We will explore the practices that helped industry leaders move through the stages of PostgreSQL adoption and get as much value out of their deployment as possible without incurring undue risk.
Bare-metal performance for Big Data workloads on Docker containersBlueData, Inc.
In a benchmark study, Intel® compared the performance of Big Data workloads running on a bare-metal deployment versus running in Docker* containers with the BlueData® EPIC™ software platform.
This in-depth study shows that performance ratios for container-based Hadoop workloads on BlueData EPIC are equal to — and in some cases, better than — bare-metal Hadoop. For example, benchmark tests showed that the BlueData EPIC platform demonstrated an average 2.33% performance gain over bare metal, for a configuration with 50 Hadoop compute nodes and 10 terabytes (TB) of data. These performance results were achieved without any modifications to the Hadoop software.
This is a revolutionary milestone, and the result of an ongoing collaboration between Intel and BlueData software engineering teams.
This white paper describes the software and hardware configurations for the benchmark tests, as well as details of the performance benchmark process and results.
Red Hat's Ross Turk took the podium at the Public Sector Red Hat Storage Days on 1/20/16 and 1/21/16 to explain just why software-defined storage matters.
How to use postgresql.conf to configure and tune the PostgreSQL serverEDB
Tuning your PostgreSQL server plays an important role in making sure you get the most out of your server resources, and running with default parameters is not always enough. Using the PostgreSQL server configuration file postgresql.conf, we can tune the right area and make the most out of the server resources. The postgresql.conf file tuning parameters are classified into different categories including database connections, memory, optimizers, and logging.
In this webinar, you will learn:
- A basic understanding of postgresql.conf
- The categories and parameters of postgresql.conf
- How to adjust parameters
- Expert tuning recommendations
To segment effectively, you need to understand what drives the segments, not just how to measure them. That's where qualitative insight comes in.
Please credit the author if you use the material. Some images are subject to copyright.
A global qualitative study was held with people in 11 countries to find out what they thought about the January 2017 women's march.
The study was conducted by Think Global Qualitative, a global network of senior qualitative specialists.
Trinity 大幅提昇企業面對大量快速變化資訊潮流時的競爭力。
現今企業 BI 多建於 RDBMS 上,伴隨大量的 ETL 與資料交換作業。在導入 Hadoop Big Data 應用之後, 如何有效地與既有 BI 系統介接,且進一步整合,以發揮整體綜效,將是一項挑戰。
Trinity 藉由優越的架構,在傳統 Structured Data 與 Hadoop Big Data 的應用間,建立無縫的交換作業,讓資訊分析人員直接運用熟悉的方式,以大幅降低導入 Big Data 應用時的學習曲線與後續對系統維運所投入的人力。
If it's going to work, you need to involve people outside the marketing function. Actually, you have a change management project on your hands. See why.
Please credit the author if you use the material. Some images are subject to copyright.
Hp Converged Systems and Hortonworks - Webinar SlidesHortonworks
Our experts will walk you through some key design considerations when deploying a Hadoop cluster in production. We'll also share practical best practices around HP and Hortonworks Data Platform to get you started on building your modern data architecture.
Learn how to:
- Leverage best practices for deployment
- Choose a deployment model
- Design your Hadoop cluster
- Build a Modern Data Architecture and vision for the Data Lake
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAlluxio, Inc.
Alluxio Tech Talk
Jul 17, 2019
Speakers:
Brien Porter, Intel
Alex Ma, Alluxio
The ever increasing challenge to process and extract value from exploding data with AI and analytics workloads makes a memory centric architecture with disaggregated storage and compute more attractive. This decoupled architecture enables users to innovate faster and scale on-demand. Enterprises are also increasingly looking towards object stores to power their big data & machine learning workloads in a cost-effective way. However, object stores don’t provide big data compatible APIs as well as the required performance.
In this webinar, the Intel and Alluxio teams will present a proposed reference architecture using Alluxio as the in-memory accelerator for object stores to enable modern analytical workloads such as Spark, Presto, Tensorflow, and Hive. We will also present a technical overview of Alluxio.
New Ceph capabilities and Reference ArchitecturesKamesh Pemmaraju
Have you heard about Inktank Ceph and are interested to learn some tips and tricks for getting started quickly and efficiently with Ceph? Then this is the session for you!
In this two part session you learn details of:
• the very latest enhancements and capabilities delivered in Inktank Ceph Enterprise such as a new erasure coded storage back-end, support for tiering, and the introduction of user quotas.
• best practices, lessons learned and architecture considerations founded in real customer deployments of Dell and Inktank Ceph solutions that will help accelerate your Ceph deployment.
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Red_Hat_Storage
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About? By: Kamesh Pemmaraju,Neil Levine
Have you heard about Inktank Ceph and are interested to learn some tips and tricks for getting started quickly and efficiently with Ceph? Then this is the session for you! In this two part session you learn details of: • the very latest enhancements and capabilities delivered in Inktank Ceph Enterprise such as a new erasure coded storage back-end, support for tiering, and the introduction of user quotas. • best practices, lessons learned and architecture considerations founded in real customer deployments of Dell and Inktank Ceph solutions that will help accelerate your Ceph deployment.
Improving Apache Spark by Taking Advantage of Disaggregated ArchitectureDatabricks
Shuffle in Apache Spark is an intermediate phrase redistributing data across computing units, which has one important primitive that the shuffle data is persisted on local disks. This architecture suffers from some scalability and reliability issues. Moreover, the assumptions of collocated storage do not always hold in today’s data centers. The hardware trend is moving to disaggregated storage and compute architecture for better cost efficiency and scalability.
To address the issues of Spark shuffle and support disaggregated storage and compute architecture, we implemented a new remote Spark shuffle manager. This new architecture writes shuffle data to a remote cluster with different Hadoop-compatible filesystem backends.
Firstly, the failure of compute nodes will no longer cause shuffle data recomputation. Spark executors can also be allocated and recycled dynamically which results in better resource utilization.
Secondly, for most customers currently running Spark with collocated storage, it is usually challenging for them to upgrade the disks on every node to latest hardware like NVMe SSD and persistent memory because of cost consideration and system compatibility. With this new shuffle manager, they are free to build a separated cluster storing and serving the shuffle data, leveraging the latest hardware to improve the performance and reliability.
Thirdly, in HPC world, more customers are trying Spark as their high performance data analytics tools, while storage and compute in HPC clusters are typically disaggregated. This work will make their life easier.
In this talk, we will present an overview of the issues of the current Spark shuffle implementation, the design of new remote shuffle manager, and a performance study of the work.
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
VMworld 2013
Abhishek Kashyap, Pivotal
Kevin Leong, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
Hadoop makes data storage and processing at scale available as a lower cost and open solution. If you ever wanted to get your feet wet but found the elephant intimidating fear no more.
We will explore several integration considerations from a Windows application prospective like accessing HDFS content, writing streaming jobs, using .NET SDK, as well as HDInsight on premise or on Azure.
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld
VMworld 2013
Michael Corey, Ntirety, Inc
Jeff Szastak, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
Delivering Apache Hadoop for the Modern Data Architecture Hortonworks
Join Hortonworks and Cisco as we discuss trends and drivers for a modern data architecture. Our experts will walk you through some key design considerations when deploying a Hadoop cluster in production. We'll also share practical best practices around Cisco-based big data architectures and Hortonworks Data Platform to get you started on building your modern data architecture.
Similar to Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃 (20)
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionEtu Solution
講者:Informatica 資深產品顧問 | 尹寒柏
議題簡介:Big Data 時代,比的不是數據數量,而是了解數據的深度。現在,因為 Big Data 技術的成熟,讓非資訊背景的 CXO 們,可以讓過去像是專有名詞的 CI (Customer Intelligence) 變成動詞,從 BI 進入 CI,更連結消費者經濟的脈動,洞悉顧客的意圖。不過,有個 Big Data 時代要 注意的思維,那就是競爭到最後,不單只是看數據量的增長,還要比誰能更了解數據的深度。而 Informatica 正是這個最佳解決的答案。我們透過 Informatica 解決在企業及時提供可信賴數據的巨大壓力;同時隨著日益增高的數據量和複雜程度,Informatica 也有能力提供更快速彙集數據技術,從而讓數據變的有意義並可供企業用來促進效率提升、完善品質、保證確定性和發揮優勢的功能。Inforamtica 提供了更為快速有效地實現此目標的方案,是精誠集團在 Big Data 時代的最佳工具。
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...Etu Solution
講者:SYSTEX 數據加值應用發展部產品經理 | 陶靖霖
議題簡介:認清現實吧! Big Data 是個熱門詞彙、熱門議題,但是問題的核心仍然圍繞在資料處理的流程、架構與技術,要踏入 Big Data 的領域,使用者會遭遇哪些挑戰? Splunk 被譽為「全球最佳的 Big Data Company」,究竟在資料處理的流程中擁有什麼獨特的技術優勢,能夠幫助使用者克服這些挑戰?又有哪些成功幫助使用者從資料中萃取出價值的應用案例?歡迎來認識 Splunk 以及全球 Big Data 成功案例。
Big Data Taiwan 2014 Keynote 4: Monetize Enterprise Data – Big Data 在台灣的經典應用與行動Etu Solution
講者:Etu 資深協理 | 陳育杰
簡介:過去這兩年內,Big Data 在企業的應用架構已逐漸形塑出來,我們看到,不同的產業,陸續開始運用 Hadoop 來解決不同的問題,而背後的 IT 架構,其實都具有一些共通性。我們將透過這些共通性的架構來探索 Big Data / Hadoop 具體展現的企業應用。
Big Data Taiwan 2014 Keynote 2: Hadoop and the Future of Data ManagementEtu Solution
Speaker:
1. Christopher Poulos | Vice President, Asia Pacific and Japan at Cloudera
2. Gab Gennai | Technical Services Director, Asia Pacific and Japan at Cloudera
Introduction: With no doubt Apache Hadoop is leading the way in enterprise architecture, find out how easily it integrates with your existing hardware and software infrastructure.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
1. 解構⼤大數據架構
⼤大數據系統的伺服器與網路資源規劃
“How to eat an elephant – one byte at a time”
CP Li 李俊邦
Enterprise Technologist
Enterprise Solutions & Alliances, Greater China
Dell
4. 4
Server Roles – Name Nodes
• 存放HDFS的metadata
• Job Manager for YARN data-processing framework
• Primary
– Heartbeats from data nodes
– 10th heartbeat is a block report from which it generates
metadata
• Standby
– Checks in every hour to mirror metadata / block map
– Not a hot-spare – requires manual fail-over
• High Availability (HA) can be added in some
distributions
– Results in a dedicated HA node that acts as a witness
to the Name Node cluster
5. 5
Server Roles - Edge Nodes
• 資料進出Hadoop叢集的主要端⼝口
• 可擴展
• Hadoop叢集裡唯⼀一的多網段節點
PowerEdge
R730
–
Name
Node
PowerEdge
R730
–
Standby
Name
Node
PowerEdge
R730
–
Edge
Node(s)
PowerEdge
R730
–
HA
Node
Corporate
Network Data
Network
Corporate
Data
Network
Data
Network
Data
Network
Data
Network
PowerEdge
R730XD
–
Data
Nodes
Data
Network
6. 6
Server Roles - Data Node
• HDFS的主要存放處
• 執⾏行YARN資源管理所指定的資料處理
• 主要屬性
– 記憶體
› 標配64GB
› 更多服務(Impala/Spark) 需要更多記憶體
– 很多的本地硬碟 (JBOD / Non-RAID mode)
› SFF (2.5”) for performance-based workloads
› LFF (3.5”)for capacity-centric workloads
– CPUs – legacy recommendation of 1:1 core:spindle ratio
› SSDs, faster HDD (10K+), and in-memory workloads make this less of an issue
› 10 and 12 core are the best practice default
9. 9
Hadoop Cluster Deployment – Installation Best
Practices
• Use pre-built, assembled & cabled racks from vendor
• ⾃自動佈署⼯工具 (ex: Open Crowbar)
• Purchase nodes in standard size groups for easy capacity growth and ordering, not in single node
increments
– Common increments are ½ or full rack for easy deployment and sizing
• For each type of hardware, purchase spare components to keep on site for easy, rapid repair
15. 15
HDFS Capacity
• HDFS protects information through replication of the data between nodes, the default Replication
Factor is 3, but is configurable.
• HDFS Raw Capacity = Number of Compute Nodes x Number of Drives x Capacity of Drives
• HDFS Usable Capacity = HDFS Raw Capacity/Replication Factor
16. 16
Big Data Networking Best Practices
• Traditional Ethernet is used since it’s affordable and already prevalent.
• 1GbE networking was used initially in early drafts of the solution but with the reduction in cost it’s
much more efficient to go with 10GbE.
• Multiple ports are teamed both for redundancy and throughput. LACP or software bonding are the
most common methods.
• IPv4 is most widely used. IPv6 has limited support at the OS and Hadoop level.
17. 17
Attributes of a Good Switch for Big Data
• Non-blocking backplane
• Deep per-port packet buffers (shared buffers do not work well). During sort/shuffle phases of
map/reduce operations network traffic is so chaotic that it can saturate any and all shared buffers,
impacting multiple host’s network performance.
• Good choices:
– 1GbE
› S55
› S60
– 10GbE
› S4810
› S5000
– 40GbE
› Z9000
› Z9500
› S6000
20. 20
Dell Points of Integration
• VLT / VRRP is a very affordable way to team switches both at the ToR and the aggregation tiers.
This makes the Dell Networking Force10 switches a great choice.
• Active Fabric Manager
– Speeds up the creation and administration of the required VLT / VRRP configuration on the switches.
– Helps with capacity-planning as customer scale
21. 21
Big Data Networking Futures
• 40GbE onboard LOMs will begin to be used for high-volume clusters. Right now the cost:benefit
ratio isn’t there yet.
• As HPC and Big Data converge, we’ll start to see the use of IB for node-to-node connectivity.
• In-memory (Spark / Impala) workloads are reducing the bottlenecks that used to exist at the disk
and now move to the processor and network. Expect customers to be looking to increase core
counts and network speed to overcome this.
22. @Dell_Enterprise Enterprise Solutions
Etu+Dell = complete Hadoop/Big Data solution provider
Best of breed
Cloudera partners
- Etu
Analytic software
solutions for Big Data
Dell Professional Services for Big Data
Dell PowerEdge
13G servers
Dell Networking
solutions
Installation and configuration service
Complete end-to-end implementation
Discover Plan ImplementInvestigate
23. 2. Store1. Integrate
4. Act
3. Analyze
Solution architecture
Analytical output
Toad Data Point
Desktop – integrate, cleanse
Dell Boomi
Cloud – integrate, correlate
Toad Intelligence
Central
Data aggregation
and virtualization
Dell STATISTICA
Customer data
Order data
Events
Stock market data
Advanced
Analytics
Marketing campaigns
Dell Statistica Big Data
Desktop – crawl, save
Social Media
24. 24
Futures
• Speed Improvements in Map / Reduce
• More in-memory workloads
– Possible move to Spark to replace Map/Reduce
• Virtualized Hadoop
– VMWare Big Data Extensions
– Openstack Sahara
– Microsoft HDInsights (Hortonworks)
25. 25
Dell In-Memory Appliance for Cloudera Enterprise
Configurations at a glance
Mid-Size Configuration
16 Node Cluster
PowerEegeR720- 4 Infrastructure Nodes
with ProSupport
PowerEdgeR720XD- 12 Data Nodes with
ProSupport
Cloudera Enterprise
Force10- S4810P
Force10- S55
Dell Rack 42U
~528TB (disk raw space)
Starter Configuration
8 Node Cluster
PowerEdge R720- 4 Infrastructure Nodes
with ProSupport
PowerEdgeR720XD- 4 Data Nodes with
ProSupport
Cloudera Enterprise
Force10- S4810P
Force10- S55
Dell Rack 42U
~176TB (disk raw space)
Small Enterprise
Configuration
24 Node Cluster
PowerEdgeR720- 4 Infrastructure Nodes
with ProSupport
PowerEdgeR720XD- 20 Data Nodes with
ProSupport
Cloudera Enterprise
Force10- S4810P
Force10- S55
Dell Rack 42U
~880TB (disk raw space)
Expansion Unit- PowerEdgeR720XD-4 Data Nodes w ProSupport, Cloudera Enterprise, Scales in
Blocks