This is a presentation which was presented in multiple forums (in one way or the other). This is a short introduction for Oracle personal (DBAs and DB Developers) for Big Data and the Hadoop Ecosystem.
In the agenda:
• What is the Big Data challenge?
• A Big Data Solution: Apache Hadoop
• HDFS
• MapReduce and YARN
• Hadoop Ecosystem: HBase, Sqoop, Hive, Pig and other tools
• Another Big Data Solution: Apache Spark
• Where does the DBA fits in?
This presentation was presented in DOAG 2016, HROUG 2016, BGOUG 2016, ILOUG Tech Days 2016 and other small private sessions (Israel Technology Police leaders, CIO forum, Amdocs, and others).
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemZohar Elkayam
Session from BGOUG I presented in June, 2016
Big data is one of the biggest buzzword in today's market. Terms like Hadoop, HDFS, YARN, Sqoop, and non-structured data has been scaring DBA's since 2010 - but where does the DBA team really fit in?
In this session, we will discuss everything database administrators and database developers needs to know about big data. We will demystify the Hadoop ecosystem and explore the different components. We will learn how HDFS and MapReduce are changing the data world, and where traditional databases fits into the grand scheme of things. We will also talk about why DBAs are the perfect candidates to transition into Big Data and Hadoop professionals and experts.
Introduction to Big Data and NoSQL.
This presentation was given to the Master DBA course at John Bryce Education in Israel.
Work is based on presentations by Michael Naumov, Baruch Osoveskiy, Bill Graham and Ronen Fidel.
Exploring Oracle Multitenant in Oracle Database 12cZohar Elkayam
Oracle multi tenant architecture is one of the biggest changes in Oracle 12c. In this presentation, we will review this major change and see how it can be effective for daily use.
The agenda:
- The Multitenant Container Database Architecture
- Multitenant Benefits and Impacts
- CDB and PDB Deployments and Provisioning
- Tools and Self-service tools
This presentation is based on work of Ami Aharonovich and was adapted with his permission.
Docker Concepts for Oracle/MySQL DBAs and DevOpsZohar Elkayam
Oracle Week 2017 Slides
Agenda:
Docker overview – why do we even need containers?
Installing Docker and getting started
Images and Containers
Docker Networks
Docker Storage and Volumes
Oracle and Docker
Docker tools, GUI and Swarm
Adding real time reporting to your database oracle db in memoryZohar Elkayam
This is a presentation I gave in the UKOUG Scotland user conference in June 2015. This is presentation describe a proof of concept we did for Clarizen on the Oracle 12c Database In Memory Option.
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527Zohar Elkayam
Big data is one of the biggest buzzwords in today's market. Terms such as Hadoop, HDFS, YARN, Sqoop, and non-structured data have been scaring DBAs since 2010, but where does the DBA team really fit in?
In this session, we will discuss everything database administrators and database developers need to know about big data. We will demystify the Hadoop ecosystem and explore the different components. We will learn how HDFS and MapReduce are changing the data world and where traditional databases fit into the grand scheme of things. We will also talk about why DBAs are the perfect candidates to transition into big data and Hadoop professionals and experts.
This is the presentation I gave in Kscope17, on June 27, 2017.
Introduction to Oracle Data Guard BrokerZohar Elkayam
This is an old deck I recently renewed for a customer session. This is the introduction to Oracle Data Guard broker feature, how to deploy it, how to use it and what are its benefits.
This presentation is based on version 11g but most of it is also compatible to Oracle 12c,
Agenda:
- Oracle Data Guard overview
- Dataguard broker introduction
- Configuring and using the data guard
- Live Demos
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemZohar Elkayam
Session from BGOUG I presented in June, 2016
Big data is one of the biggest buzzword in today's market. Terms like Hadoop, HDFS, YARN, Sqoop, and non-structured data has been scaring DBA's since 2010 - but where does the DBA team really fit in?
In this session, we will discuss everything database administrators and database developers needs to know about big data. We will demystify the Hadoop ecosystem and explore the different components. We will learn how HDFS and MapReduce are changing the data world, and where traditional databases fits into the grand scheme of things. We will also talk about why DBAs are the perfect candidates to transition into Big Data and Hadoop professionals and experts.
Introduction to Big Data and NoSQL.
This presentation was given to the Master DBA course at John Bryce Education in Israel.
Work is based on presentations by Michael Naumov, Baruch Osoveskiy, Bill Graham and Ronen Fidel.
Exploring Oracle Multitenant in Oracle Database 12cZohar Elkayam
Oracle multi tenant architecture is one of the biggest changes in Oracle 12c. In this presentation, we will review this major change and see how it can be effective for daily use.
The agenda:
- The Multitenant Container Database Architecture
- Multitenant Benefits and Impacts
- CDB and PDB Deployments and Provisioning
- Tools and Self-service tools
This presentation is based on work of Ami Aharonovich and was adapted with his permission.
Docker Concepts for Oracle/MySQL DBAs and DevOpsZohar Elkayam
Oracle Week 2017 Slides
Agenda:
Docker overview – why do we even need containers?
Installing Docker and getting started
Images and Containers
Docker Networks
Docker Storage and Volumes
Oracle and Docker
Docker tools, GUI and Swarm
Adding real time reporting to your database oracle db in memoryZohar Elkayam
This is a presentation I gave in the UKOUG Scotland user conference in June 2015. This is presentation describe a proof of concept we did for Clarizen on the Oracle 12c Database In Memory Option.
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527Zohar Elkayam
Big data is one of the biggest buzzwords in today's market. Terms such as Hadoop, HDFS, YARN, Sqoop, and non-structured data have been scaring DBAs since 2010, but where does the DBA team really fit in?
In this session, we will discuss everything database administrators and database developers need to know about big data. We will demystify the Hadoop ecosystem and explore the different components. We will learn how HDFS and MapReduce are changing the data world and where traditional databases fit into the grand scheme of things. We will also talk about why DBAs are the perfect candidates to transition into big data and Hadoop professionals and experts.
This is the presentation I gave in Kscope17, on June 27, 2017.
Introduction to Oracle Data Guard BrokerZohar Elkayam
This is an old deck I recently renewed for a customer session. This is the introduction to Oracle Data Guard broker feature, how to deploy it, how to use it and what are its benefits.
This presentation is based on version 11g but most of it is also compatible to Oracle 12c,
Agenda:
- Oracle Data Guard overview
- Dataguard broker introduction
- Configuring and using the data guard
- Live Demos
The Hadoop Ecosystem for developers session in DevGeekWeek in Israel.
This was a day long session talking about big data problems and the hadoop solution. we also talked about Spark and NoSQL.
MySQL 5.7 New Features for Developers session for DOAG (Oracle user group conference) in 2016. A similar version was also presented in Israel MySQL User Group on November 2016.
This presentation review new features in MySQL 5.7: Optimizer, InnoDB engine, JSON native data type, performance and sys schemas
Oracle Database In-Memory Option for ILOUGZohar Elkayam
Oracle 12.1.0.2 introduced a new feature: the Oracle In Memory Option (Databases In Memory - DBIM).
This is the presentation which was given before the ILOUG DBA SIG where I introduced the technology and how to use it.
MySQL can now be used as a document store, combining the flexibility of the document store model with the power of the relational model. You’ll understand why you’ll be able to choose MySQL for your Relational AND Document Store needs, avoiding significant trade-offs and being forced into choosing multiple solutions.
Fast, Flexible Application Development with Oracle Database Cloud ServiceGustavo Rene Antunez
Developing applications to run on the most important Database Manager in the world ? Why not do it in the cloud? With Oracle Database Cloud Service, developers can quickly and easily access the power and flexibility of the Oracle database in the cloud. With a choice between an instance or a dedicated database with full administrative control, or a schema dedicated to a development platform and full deployment managed by Oracle, developers can decide how much control they have over their development environments. Attend this session to learn more about the features and benefits of Oracle Database Cloud.
MySQL 8.0 is the latest Generally Available version of MySQL. This session will give a brief introduction to MySQL 8.0 and help you upgrade from older versions, understand what utilities are available to make the process smoother and also understand what you need to bear in mind with the new version and considerations for possible behaviour changes and solutions. It really is a simple process.
Databases require capacity planning (and to those coming from traditional RDBMS solutions, this can be thought of as a sizing guide). Capacity planning prevents resource exhaustion. Capacity planning can be hard. This talk has a heavier leaning on MySQL, but the concepts and addendum will help with any other data store.
Rapid Cluster Computing with Apache Spark 2016Zohar Elkayam
This is the presentation I used for Oracle Week 2016 session about Apache Spark.
In the agenda:
- The Big Data problem and possible solutions
- Basic Spark Core
- Working with RDDs
- Working with Spark Cluster and Parallel programming
- Spark modules: Spark SQL and Spark Streaming
- Performance and Troubleshooting
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
Most DBAs are aware something interesting is going on with big data and the Hadoop product ecosystem that underpins it, but aren't so clear about what each component in the stack does, what problem each part solves and why those problems couldn't be solved using the old approach. We'll look at where it's all going with the advent of Spark and machine learning, what's happening with ETL, metadata and analytics on this platform ... why IaaS and datawarehousing-as-a-service will have such a big impact, sooner than you think
The Hadoop Ecosystem for developers session in DevGeekWeek in Israel.
This was a day long session talking about big data problems and the hadoop solution. we also talked about Spark and NoSQL.
MySQL 5.7 New Features for Developers session for DOAG (Oracle user group conference) in 2016. A similar version was also presented in Israel MySQL User Group on November 2016.
This presentation review new features in MySQL 5.7: Optimizer, InnoDB engine, JSON native data type, performance and sys schemas
Oracle Database In-Memory Option for ILOUGZohar Elkayam
Oracle 12.1.0.2 introduced a new feature: the Oracle In Memory Option (Databases In Memory - DBIM).
This is the presentation which was given before the ILOUG DBA SIG where I introduced the technology and how to use it.
MySQL can now be used as a document store, combining the flexibility of the document store model with the power of the relational model. You’ll understand why you’ll be able to choose MySQL for your Relational AND Document Store needs, avoiding significant trade-offs and being forced into choosing multiple solutions.
Fast, Flexible Application Development with Oracle Database Cloud ServiceGustavo Rene Antunez
Developing applications to run on the most important Database Manager in the world ? Why not do it in the cloud? With Oracle Database Cloud Service, developers can quickly and easily access the power and flexibility of the Oracle database in the cloud. With a choice between an instance or a dedicated database with full administrative control, or a schema dedicated to a development platform and full deployment managed by Oracle, developers can decide how much control they have over their development environments. Attend this session to learn more about the features and benefits of Oracle Database Cloud.
MySQL 8.0 is the latest Generally Available version of MySQL. This session will give a brief introduction to MySQL 8.0 and help you upgrade from older versions, understand what utilities are available to make the process smoother and also understand what you need to bear in mind with the new version and considerations for possible behaviour changes and solutions. It really is a simple process.
Databases require capacity planning (and to those coming from traditional RDBMS solutions, this can be thought of as a sizing guide). Capacity planning prevents resource exhaustion. Capacity planning can be hard. This talk has a heavier leaning on MySQL, but the concepts and addendum will help with any other data store.
Rapid Cluster Computing with Apache Spark 2016Zohar Elkayam
This is the presentation I used for Oracle Week 2016 session about Apache Spark.
In the agenda:
- The Big Data problem and possible solutions
- Basic Spark Core
- Working with RDDs
- Working with Spark Cluster and Parallel programming
- Spark modules: Spark SQL and Spark Streaming
- Performance and Troubleshooting
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
Most DBAs are aware something interesting is going on with big data and the Hadoop product ecosystem that underpins it, but aren't so clear about what each component in the stack does, what problem each part solves and why those problems couldn't be solved using the old approach. We'll look at where it's all going with the advent of Spark and machine learning, what's happening with ETL, metadata and analytics on this platform ... why IaaS and datawarehousing-as-a-service will have such a big impact, sooner than you think
A summarized version of a presentation regarding Big Data architecture, covering from Big Data concept to Hadoop and tools like Hive, Pig and Cassandra
If you are search Best Engineering college in India, Then you can trust RCE (Roorkee College of Engineering) services and facilities. They provide the best education facility, highly educated and experienced faculty, well furnished hostels for both boys and girls, top computerized Library, great placement opportunity and more at affordable fee.
http://www.learntek.org/product/big-data-and-hadoop/
http://www.learntek.org
Learntek is global online training provider on Big Data Analytics, Hadoop, Machine Learning, Deep Learning, IOT, AI, Cloud Technology, DEVOPS, Digital Marketing and other IT and Management courses. We are dedicated to designing, developing and implementing training programs for students, corporate employees and business professional.
This is a presentation on apache hadoop technology. This presentation may be helpful for the beginners to know about the terminologies of hadoop. This presentation contains some pictures which describes about the working function of this technology. I hope it will be helpful for the beginners.
Thank you.
This presentation is about apache hadoop technology. This may be helpful for the beginners. The beginners will know about some terminologies of hadoop technology. There is also some diagrams which will show the working of this technology.
Thank you.
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsZohar Elkayam
Oracle Week 2017 slides.
Agenda:
Basics: How and What To Tune?
Using the Automatic Workload Repository (AWR)
Using AWR-Based Tools: ASH, ADDM
Real-Time Database Operation Monitoring (12c)
Identifying Problem SQL Statements
Using SQL Performance Analyzer
Tuning Memory (SGA and PGA)
Parallel Execution and Compression
Oracle Database 12c Performance New Features
The art of querying – newest and advanced SQL techniquesZohar Elkayam
Presentation from Oracle Week 2017.
Agenda:
Aggregative and advanced grouping options
Analytic functions, ranking and pagination
Hierarchical and recursive queries
Regular Expressions
Oracle 12c new rows pattern matching
XML and JSON handling with SQL
Oracle 12c (12.1 + 12.2) new features
SQL Developer Command Line tool (if time allows)
Oracle 18c
Oracle Advanced SQL and Analytic FunctionsZohar Elkayam
Even though DBAs and developers are writing SQL queries every day, it seems that advanced SQL techniques such as multidimension aggregation and analytic functions still remain relatively unknown. In this session, we will explore some of the common real-world usages for analytic function and understand how to take advantage of this great and useful tool. We will deep dive into ranking based on values and groups, understand aggregation of multiple dimensions without a group by, see how to do inter-row calculations, and much more.
This is the presentation slides which was presented in Kscope 17 on June 28, 2017.
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem (c17lv version)Zohar Elkayam
Big data is one of the biggest buzzword in today's market. Terms like Hadoop, HDFS, YARN, Sqoop, and non-structured data has been scaring DBA's since 2010 - but where does the DBA team really fit in?
In this session, we will discuss everything database administrators and database developers needs to know about big data. We will demystify the Hadoop ecosystem and explore the different components. We will learn how HDFS and MapReduce are changing the data world, and where traditional databases fits into the grand scheme of things. We will also talk about why DBAs are the perfect candidates to transition into Big Data and Hadoop professionals and experts.
Learning Objective #1: What is the Big Data challenge
Learning Objective #2: Learn about Hadoop - HDFS, MapReduce and Yarn
Learning Objective #3: Understand where a DBA fits in this world
Oracle 12c New Features For Better PerformanceZohar Elkayam
Oracle 12cR1 and 12cR2 came with some great features for better performance and scaling. In this session we will talk about some of the new features that might improve performance greatly: Optimizer changes, adaptive plans improvements, changes to statistics gathering and we'll get to know Oracle 12cR2 new sharding option
On the agenda:
- Oracle Database In Memory (Column Store)
- Oracle Sharding (12.2.0.1)
- Optimizer changes in 12c
- Statistics changes in 12c.
Presented first at ilOUG - Israel Oracle User Group meetup in February 2017.
[including promised hidden slide.. :) ]
Advanced PL/SQL Optimizing for Better Performance 2016Zohar Elkayam
This is the presentation I used for Oracle Week 2016 session. This includes new features from both 12cR1 and 12cR2.
Agenda:
Developing PL/SQL:
- Composite datatypes, advanced cursors, dynamic SQL, tracing, and more…
Compiling PL/SQL:
- dependencies, optimization levels, and DBMS_WARNING
Tuning PL/SQL:
- GTT, Result cache and Memory handling
- Oracle 11g, 12cR1 and 12cR2 new useful features
- SQLcl – New replacement tool for SQL*Plus (if we have time)
This is a presentation from Oracle Week 2016 (Israel). This is a newer version from last year with new 12cR2 features and demo.
In the agenda:
Aggregative and advanced grouping options
Analytic functions, ranking and pagination
Hierarchical and recursive queries
Regular Expressions
Oracle 12c new rows pattern matching
XML and JSON handling with SQL
Oracle 12c (12.1 + 12.2) new features
SQL Developer Command Line tool
OOW2016: Exploring Advanced SQL Techniques Using Analytic FunctionsZohar Elkayam
This is the presentation I gave on the Oracle Open World 2016 - the topic was group functions and analytic functions.
We talked about reporting analytic functions, ranking and couple of Oracle 12c new features like top-n query syntax and pattern matching.
This presentation has the bonus slides which were not presented at the event itself, as promissed
Is SQLcl the Next Generation of SQL*Plus?Zohar Elkayam
Session from ILOUG I presented in May, 2016
Introducing the new tool from the developers of SQL Developer: SQLcl – a new command line tool from the SQL Developer team that might replace SQL*Plus and all of its functions which has been around for over 30 years!
In this session, we will explore the new functionality of the SQLcl, and use a live demonstration to show what SQLcl has to offer over the old SQL*Plus. We will use real life example to see what makes this tool such a time saver in day-to-day tasks for DBAs and developers who prefer using the command line interface.
Exploring Advanced SQL Techniques Using Analytic FunctionsZohar Elkayam
Session from ILOUG I presented in May, 2016
Even though DBAs and developers are writing SQL queries every day, it seems that advanced SQL techniques such as multi-dimension aggregation and analytic functions are still relatively remain unknown. In this session, we will explore some of the common real-world usages for analytic function, and understand how to take advantage of this great and useful tool. We will deep dive into ranking based on values and groups; understand aggregation of multiple dimensions without a group by; see how to do inter-row calculations, and much-much more…
Together we will see how we can unleash the power of analytics using Oracle 11g best practices and Oracle 12c new features.
Exploring Advanced SQL Techniques Using Analytic FunctionsZohar Elkayam
Session from BGOUG I presented in June, 2016
Even though DBAs and developers are writing SQL queries every day, it seems that advanced SQL techniques such as multi-dimension aggregation and analytic functions are still relatively remain unknown. In this session, we will explore some of the common real-world usages for analytic function, and understand how to take advantage of this great and useful tool. We will deep dive into ranking based on values and groups; understand aggregation of multiple dimensions without a group by; see how to do inter-row calculations, and much-much more…
Together we will see how we can unleash the power of analytics using Oracle 11g best practices and Oracle 12c new features.
Advanced PLSQL Optimizing for Better PerformanceZohar Elkayam
A Presentation from Oracle Week 2015 in Israel
Agenda:
• Developing PL/SQL:
o Composite Data Types: Records, Collections and Table type
o Advanced Cursors: Ref cursor, Cursor function, Cursor subquery in PL/SQL
o Bulk Binding
o Dynamic SQL – SQL Injection
o Tracing PL/SQL Execution
o Design patterns for PL/SQL: Autonomous Transactions, Invoker and Definer rights, serially_reusable code
o Triggers Improvements
• Compiling PL/SQL:
o PL/SQL Fine-Grain Dependency Management
o PLSQL_OPTIMIZE_LEVEL parameter
o PL/SQL Compile-Time Warnings and Using DBMS_WARNING package
• Tuning PL/SQL:
o Handling Packages in Memory
o Global Temporary Tables
o PL/SQL Function Result Cache and pitfalls
• Oracle Database 12c PL/SQL new features: What is new in Oracle 12c
o Language Usability Enhancements
o New Limitations
• Additional useful features, Tips and Tricks for better performance
Oracle Week 2015 presentation (Presented on November 15, 2015)
Agenda:
Aggregative and advanced grouping options
Analytic functions, ranking and pagination
Hierarchical and recursive queries
Oracle 12c new rows pattern matching feature
XML and JSON handling with SQL
Regular Expressions
SQLcl – a new replacement tool for SQL*Plus from Oracle
This is a presentation I gave at UKOUG user conference in Scotland. SQLcl is a new command line tool from the developers of SQL Developer in Oracle, This presentation is accompanied by live demo that can be downloaded from my blog.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
2. Who am I?
• Zohar Elkayam, CTO at Brillix
• Programmer, DBA, team leader, database trainer, public
speaker, and a senior consultant for over 18 years
• Oracle ACE Associate
• Part of ilOUG – Israel Oracle User Group
• Involved with Big Data projects since 2011
• Blogger – www.realdbamagic.com and www.ilDBA.co.il
2 http://brillix.co.il
3. About Brillix
• We offer complete, integrated end-to-end solutions based on best-of-
breed innovations in database, security and big data technologies
• We provide complete end-to-end 24x7 expert remote database
services
• We offer professional customized on-site trainings, delivered by our
top-notch world recognized instructors
3
5. Agenda
• What is the Big Data challenge?
• A Big Data Solution: Apache Hadoop
• HDFS
• MapReduce and YARN
• Hadoop Ecosystem: HBase, Sqoop, Hive, Pig and other tools
• Another Big Data Solution: Apache Spark
• Where does the DBA fits in?
http://brillix.co.il5
8. Volume
• Big data comes in one size: Big.
• Size is measured in Terabyte (1012), Petabyte (1015),
Exabyte (1018), Zettabyte (1021)
• The storing and handling of the data becomes an issue
• Producing value out of the data in a reasonable time is an
issue
http://brillix.co.il8
9. Variety
• Big Data extends beyond structured data, including semi-structured
and unstructured information: logs, text, audio and videos
• Wide variety of rapidly evolving data types requires highly flexible
stores and handling
http://brillix.co.il9
Un-Structured Structured
Objects Tables
Flexible Columns and Rows
Structure Unknown Predefined Structure
Textual and Binary Mostly Textual
10. Velocity
•The speed in which data is being generated and
collected
•Streaming data and large volume data movement
•High velocity of data capture – requires rapid
ingestion
•Might cause a backlog problem
http://brillix.co.il10
11. Value
Big data is not about the size of the data,
It’s about the value within the data
http://brillix.co.il11
12. So, We Define Big Data Problem…
• When the data is too big or moves too fast to handle in a
sensible amount of time
• When the data doesn’t fit any conventional database
structure
• When we think that we can still produce value from that
data and want to handle it
• When the technical solution to the business need becomes
part of the problem
http://brillix.co.il12
15. Big Data in Practice
•Big data is big: technological framework and
infrastructure solutions are needed
•Big data is complicated:
• We need developers to manage handling of the data
• We need devops to manage the clusters
• We need data analysts and data scientists to produce
value
http://brillix.co.il15
16. Possible Solutions: Scale Up
• Older solution: using a giant server with a lot of resources
(scale up: more cores, faster processers, more memory) to
handle the data
• Process everything on a single server with hundreds of CPU
cores
• Use lots of memory (1+ TB)
• Have a huge data store on high end storage solutions
• Data needs to be copied to the processes in real time, so
it’s no good for high amounts of data (Terabytes to
Petabytes)
http://brillix.co.il16
17. Another Solution: Distributed Systems
•A scale-out solution: let’s use distributed systems:
use multiple machine for a single job/application
•More machines means more resources
• CPU
• Memory
• Storage
•But the solution is still complicated: infrastructure
and frameworks are needed
http://brillix.co.il17
18. Distributed Infrastructure Challenges
• We need Infrastructure that is built for:
• Large-scale
• Linear scale out ability
• Data-intensive jobs that spread the problem across clusters of server
nodes
• Storage: efficient and cost-effective enough to capture and store
terabytes, if not petabytes, of data
• Network infrastructure that can quickly import large data sets and
then replicate it to various nodes for processing
• High-end hardware is too expensive - we need a solution that uses
cheaper hardware
http://brillix.co.il18
19. Distributed System/Frameworks Challenges
•How do we distribute our workload across the
system?
•Programming complexity – keeping the data in sync
•What to do with faults and redundancy?
•How do we handle security demands to protect
highly-distributed infrastructure and data?
http://brillix.co.il19
21. Apache Hadoop
•Open source project run by Apache Foundation (2006)
•Hadoop brings the ability to cheaply process large
amounts of data, regardless of its structure
•It Is has been the driving force behind the growth of
the big data industry
•Get the public release from:
• http://hadoop.apache.org/core/
http://brillix.co.il21
22. Original Hadoop Components
•HDFS (Hadoop Distributed File System) – distributed
file system that runs in clustered environments
•MapReduce – programming paradigm for running
processes over clustered environments
•Hadoop main idea: let’s distribute the data to many
servers, and then bring the program to the data
http://brillix.co.il22
23. Hadoop Benefits
•Designed for scale out
•Reliable solution based on unreliable hardware
•Load data first, structure later
•Designed for storing large files
•Designed to maximize throughput of large scans
•Designed to leverage parallelism
•Solution Ecosystem
23 http://brillix.co.il
24. What Hadoop Is Not?
• Hadoop is not a database – it does not a replacement for
DW, or for other relational databases
• Hadoop is not for OLTP/real-time systems
• Very good for large amounts, not so much for smaller sets
• Designed for clusters – there is no Hadoop monster server
(single server)
http://brillix.co.il24
25. Hadoop Limitations
•Hadoop is scalable but it’s not fast
•Some assembly may be required
•Batteries are not included (DIY mindset) – some
features needs to be developed if they’re not available
•Open source license limitations apply
•Technology is changing very rapidly
http://brillix.co.il25
27. Original Hadoop 1.0 Components
• HDFS (Hadoop Distributed File System) – distributed file
system that runs in a clustered environment
• MapReduce – programming technique for running
processes over a clustered environment
27 http://brillix.co.il
28. Hadoop 2.0
• Hadoop 2.0 changed the Hadoop conception and
introduced a better resource management concept:
• Hadoop Common
• HDFS
• YARN
• Multiple data processing
frameworks including
MapReduce, Spark and
others
http://brillix.co.il28
29. HDFS is...
• A distributed file system
• Designed to reliably store data using commodity hardware
• Designed to expect hardware failures and still stay
resilient
• Intended for larger files
• Designed for batch inserts and appending data (no
updates)
http://brillix.co.il29
30. Files and Blocks
•Files are split into 128MB blocks (single unit of
storage)
• Managed by NameNode and stored on DataNodes
• Transparent to users
•Replicated across machines at load time
• Same block is stored on multiple machines
• Good for fault-tolerance and access
• Default replication factor is 3
30 http://brillix.co.il
31. HDFS is Good for...
•Storing large files
• Terabytes, Petabytes, etc...
• Millions rather than billions of files
• 128MB or more per file
•Streaming data
• Write once and read-many times patterns
• Optimized for streaming reads rather than random reads
32 http://brillix.co.il
32. HDFS is Not So Good For...
• Low-latency reads / Real-time application
• High-throughput rather than low latency for small chunks of
data
• HBase addresses this issue
• Large amount of small files
• Better for millions of large files instead of billions of small files
• Multiple Writers
• Single writer per file
• Writes at the end of files, no-support for arbitrary offset
33 http://brillix.co.il
36. MapReduce is...
• A programming model for expressing distributed
computations at a massive scale
• An execution framework for organizing and performing
such computations
• MapReduce can be written in Java, Scala, C, Payton, Ruby
and others
• Concept: Bring the code to the data, not the data to the
code
http://brillix.co.il37
37. The MapReduce Paradigm
• Imposes key-value input/output
• We implement two main functions:
• MAP - Takes a large problem and divides into sub problems and
performs the same function on all sub-problems
Map(k1, v1) -> list(k2, v2)
• REDUCE - Combine the output from all sub-problems (each key goes to
the same reducer)
Reduce(k2, list(v2)) -> list(v3)
• Framework handles everything else (almost)
38 http://brillix.co.il
39. YARN
•Takes care of distributed processing and coordination
•Scheduling
• Jobs are broken down into smaller chunks called tasks
• These tasks are scheduled to run on data nodes
•Task Localization with Data
• Framework strives to place tasks on the nodes that host
the segment of data to be processed by that specific task
• Code is moved to where the data is
40 http://brillix.co.il
40. YARN
•Error Handling
• Failures are an expected behavior so tasks are
automatically re-tried on other machines
•Data Synchronization
• Shuffle and Sort barrier re-arranges and moves data
between machines
• Input and output are coordinated by the framework
41 http://brillix.co.il
41. Submitting a Job
•Yarn script with a class argument command launches
a JVM and executes the provided Job
$ yarn jar HadoopSamples.jar mr.wordcount.StartsWithCountJob
/user/sample/hamlet.txt
/user/sample/wordcount/
http://brillix.co.il42
44. Hadoop Main Problems
• Hadoop MapReduce Framework (not MapReduce
paradigm) had some major problems:
• Developing MapReduce was complicated – there was more
than just business logics to develop
• Transferring data between stages requires the intermediate
data to be written to disk (and than read by the next step)
• Multi-step needed orchestration and abstraction solutions
• Initial resource management was very painful – MapReduce
framework was based on resource slots
http://brillix.co.il45
46. Improving Hadoop: Distributions
• Core Hadoop is complicated so some tools and solution
frameworks were added to make things easier
• There are over 80 different Apache projects for big data
solution which uses Hadoop (and growing!)
• Hadoop Distributions collects some of these tools and
release them as a complete integrated package
• Cloudera
• HortonWorks
• MapR
• Amazon EMR
47
48. Improving Programmability
•MapReduce code in Java is sometime tedious, so
different solutions came to the rescue
• Pig: Programming language that simplifies Hadoop
actions: loading, transforming and sorting data
• Hive: enables Hadoop to operate as data warehouse using
SQL-like syntax
• Spark and other frameworks
http://brillix.co.il49
49. Pig
• Pig is an abstraction on top of Hadoop
• Provides high level programming language designed for data
processing
• Scripts converted into MapReduce code, and executed on the
Hadoop Clusters
• Makes ETL/ELT processing and other simple MapReduce
easier without writing MapReduce code
• Pig was widely accepted and used by Yahoo!, Twitter, Netflix,
and others
• Often replaced by more up-to-date tools like Apache Spark
http://brillix.co.il50
50. Hive
• Data Warehousing Solution built on top of Hadoop
• Provides SQL-like query language named HiveQL
• Minimal learning curve for people with SQL expertise
• Data analysts are target audience
• Early Hive development work started at Facebook in 2007
• Hive is an Apache top level project under
Hadoop
• http://hive.apache.org
http://brillix.co.il51
51. Hive Provides
•Ability to bring structure to various data formats
•Simple interface for ad hoc querying, analyzing and
summarizing large amounts of data
•Access to files on various data stores such as HDFS
and HBase
•Also see: Apache Impala (mainly in Cloudera)
52
52. Databases and DB Connectivity
•HBase: Online NoSQL Key/Value wide-column
oriented datastore that is native to HDFS
•Sqoop: a tool designed to import data from and export
data to relational databases (HDFS, Hbase, or Hive)
•Sqoop2: Sqoop centralized service (GUI, WebUI,
REST)
53
53. HBase
• HBase is the closest thing we had to
database in the early Hadoop days
• Distributed key/value with wide-column oriented NoSQL
database, built on top of HDFS
• Providing Big Table-like capabilities
• Does not have a query language: only get, put, and scan
commands
• Often compared with Cassandra
(non-Hadoop native Apache project)
http://brillix.co.il54
54. When Do We Use HBase?
•Huge volumes of randomly accessed data
•HBase is at its best when it’s accessed in a
distributed fashion by many clients (high
consistency)
•Consider HBase when we are loading data by key,
searching data by key (or range), serving data by key,
querying data by key or when storing data by row that
doesn’t conform well to a schema.
55
55. When NOT To Use HBase
•HBase doesn’t use SQL, don’t have an optimizer,
doesn’t support transactions or joins
•HBase doesn’t have data types
•See project Apache Phoenix for better data structure
and query language when using HBase
56
56. Sqoop and Sqoop2
• Sqoop is a command line tool for moving data
from RDBMS to Hadoop. Sqoop2 is a centralized
tool for running sqoop.
• Uses MapReduce load the data from relational database to HDFS
• Can also export data from HBase to RDBMS
• Comes with connectors to MySQL, PostgreSQL, Oracle, SQL
Server and DB2.
$bin/sqoop import --connect
'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch'
--table lineitem --hive-import
$bin/sqoop export --connect
'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch'
--table lineitem --export-dir /data/lineitemData
http://brillix.co.il57
57. Improving Hadoop – More Useful Tools
•For improving coordination: Zookeeper
•For improving scheduling/orchestration: Oozie
•Data Storing in memory: Apache Impala
•For Improving log collection: Flume
•Text Search and Data Discovery: Solr
•For Improving UI and Dashboards: Hue and Ambari
http://brillix.co.il58
58. Improving Hadoop – More Useful Tools (2)
•Data serialization: Avro and Parquet
•Data governance: Atlas
•Security: Knox and Ranger
•Data Replication: Falcon
•Machine Learning: Mahout
•Performance Improvement: Tez
•And there are more…
http://brillix.co.il59
60. Is Hadoop the Only Big Data Solution?
• No – There are other solutions:
• Apache Spark and Apache Mesos frameworks
• NoSQL systems (Apache Cassandra, CouchBase, MongoDB
and many others)
• Stream analysis (Apache Kafka, Apache Storm, Apache Flink)
• Machine learning (Apache Mahout, Spark MLlib)
• Some can be integrated with Hadoop, but some are
independent
http://brillix.co.il61
61. Another Big Data Solution: Apache Spark
•Apache Spark is a fast, general engine for
large-scale data processing on a cluster
•Originally developed by UC Berkeley in 2009 as a
research project, and is now an open source Apache
top level project
•Main idea: use the memory resources of the cluster
for better performance
•It is now one of the most fast-growing project today
http://brillix.co.il62
63. Okay, So Where Does the DBA Fits In?
• Big Data solutions are not databases. Databases are
probably not going to disappear, but we feel the change
even today: DBA’s must be ready for the change
• DBA’s are the perfect candidates to transition into Big Data
Experts:
• Have system (OS, disk, memory, hardware) experience
• Can understand data easily
• DBA’s are used to work with developers and other data users
http://brillix.co.il64
64. What DBAs Needs Now?
•DBA’s will need to know more programming: Java,
Scala, Python, R or any other popular language in the
Big Data world will do
•DBA’s needs to understand the position shifts, and
the introduction of DevOps, Data Scientists, CDO etc.
•Big Data is changing daily: we need to learn, read, and
be involved before we are left behind…
http://brillix.co.il65
66. Summary
• Big Data is here – it’s complicated and RDBMS does not fit
anymore
• Big Data solutions are evolving Hadoop is an example for
such a solution
• Spark is very popular Big Data solution
• DBA’s need to be ready for the change: Big Data solutions
are not databases and we make ourselves ready
http://brillix.co.il67