What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
This presentation about Apache Spark covers all the basics that a beginner needs to know to get started with Spark. It covers the history of Apache Spark, what is Spark, the difference between Hadoop and Spark. You will learn the different components in Spark, and how Spark works with the help of architecture. You will understand the different cluster managers on which Spark can run. Finally, you will see the various applications of Spark and a use case on Conviva. Now, let's get started with what is Apache Spark.
Below topics are explained in this Spark presentation:
1. History of Spark
2. What is Spark
3. Hadoop vs Spark
4. Components of Apache Spark
5. Spark architecture
6. Applications of Spark
7. Spark usecase
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
Simplilearn’s Apache Spark and Scala certification training are designed to:
1. Advance your expertise in the Big Data Hadoop Ecosystem
2. Help you master essential Apache and Spark skills, such as Spark Streaming, Spark SQL, machine learning programming, GraphX programming and Shell Scripting Spark
3. Help you land a Hadoop developer job requiring Apache Spark expertise by giving you a real-life industry project coupled with 30 demos
What skills will you learn?
By completing this Apache Spark and Scala course you will be able to:
1. Understand the limitations of MapReduce and the role of Spark in overcoming these limitations
2. Understand the fundamentals of the Scala programming language and its features
3. Explain and master the process of installing Spark as a standalone cluster
4. Develop expertise in using Resilient Distributed Datasets (RDD) for creating applications in Spark
5. Master Structured Query Language (SQL) using SparkSQL
6. Gain a thorough understanding of Spark streaming features
7. Master and describe the features of Spark ML programming and GraphX programming
Who should take this Scala course?
1. Professionals aspiring for a career in the field of real-time big data analytics
2. Analytics professionals
3. Research professionals
4. IT developers and testers
5. Data scientists
6. BI and reporting professionals
7. Students who wish to gain a thorough understanding of Apache Spark
Learn more at https://www.simplilearn.com/big-data-and-analytics/apache-spark-scala-certification-training
As businesses grow, so does the complexity of their software. New features, new models, and new background processes all continue to be added. . .and developers struggle to make sense of it all. Yet the end user demands a swift and functional experience when interacting with your application. It is paramount to be open to alternative patterns that help tame complex, high-demand services. Two such patterns are command-query responsibility segregation (CQRS) and event sourcing (ES).
Command-query responsibility segregation is an architectural pattern for user-facing applications that extends from the now standard Model-View-Controller (MVC) pattern and is an alternative to the CRUD pattern. At its core, CQRS is about changing how we think of and work with our data by introducing two types of models: all user actions become commands, and a read-only query model powers our views. Commands and queries are logistically separated, providing additional decoupling of our application. CQRS also calls for changes in how we store and structure our data.
Enter event sourcing. Instead of persisting the current state of our domain objects or entities, we record historical events about our data. The key advantage is that we can examine our application data at any point in time, rather than just the current state. This pattern changes how we persist and process our data but is surprisingly efficient.
While each of the two patterns can be used exclusively, they complement each other beautifully and facilitate the construction of decoupled, scalable applications or individual services. Stephen Pember explores the fundamentals of each pattern and offers several examples and demonstration code to show how one might actually go about implementing CQRS and ES. Steve discusses task-based UIs and domain-driven design as he outlines some of the advantages—and challenges—that ThirdChannel has seen when developing systems using CQRS and ES over the past year.
The presentation begins with an overview of the growth of non-structured data and the benefits NoSQL products provide. It then provides an evaluation of the more popular NoSQL products on the market including MongoDB, Cassandra, Neo4J, and Redis. With NoSQL architectures becoming an increasingly appealing database management option for many organizations, this presentation will help you effectively evaluate the most popular NoSQL offerings and determine which one best meets your business needs.
Summary of recent progress on Apache Drill, an open-source community-driven project to provide easy, dependable, fast and flexible ad hoc query capabilities.
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
This presentation about Apache Spark covers all the basics that a beginner needs to know to get started with Spark. It covers the history of Apache Spark, what is Spark, the difference between Hadoop and Spark. You will learn the different components in Spark, and how Spark works with the help of architecture. You will understand the different cluster managers on which Spark can run. Finally, you will see the various applications of Spark and a use case on Conviva. Now, let's get started with what is Apache Spark.
Below topics are explained in this Spark presentation:
1. History of Spark
2. What is Spark
3. Hadoop vs Spark
4. Components of Apache Spark
5. Spark architecture
6. Applications of Spark
7. Spark usecase
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
Simplilearn’s Apache Spark and Scala certification training are designed to:
1. Advance your expertise in the Big Data Hadoop Ecosystem
2. Help you master essential Apache and Spark skills, such as Spark Streaming, Spark SQL, machine learning programming, GraphX programming and Shell Scripting Spark
3. Help you land a Hadoop developer job requiring Apache Spark expertise by giving you a real-life industry project coupled with 30 demos
What skills will you learn?
By completing this Apache Spark and Scala course you will be able to:
1. Understand the limitations of MapReduce and the role of Spark in overcoming these limitations
2. Understand the fundamentals of the Scala programming language and its features
3. Explain and master the process of installing Spark as a standalone cluster
4. Develop expertise in using Resilient Distributed Datasets (RDD) for creating applications in Spark
5. Master Structured Query Language (SQL) using SparkSQL
6. Gain a thorough understanding of Spark streaming features
7. Master and describe the features of Spark ML programming and GraphX programming
Who should take this Scala course?
1. Professionals aspiring for a career in the field of real-time big data analytics
2. Analytics professionals
3. Research professionals
4. IT developers and testers
5. Data scientists
6. BI and reporting professionals
7. Students who wish to gain a thorough understanding of Apache Spark
Learn more at https://www.simplilearn.com/big-data-and-analytics/apache-spark-scala-certification-training
As businesses grow, so does the complexity of their software. New features, new models, and new background processes all continue to be added. . .and developers struggle to make sense of it all. Yet the end user demands a swift and functional experience when interacting with your application. It is paramount to be open to alternative patterns that help tame complex, high-demand services. Two such patterns are command-query responsibility segregation (CQRS) and event sourcing (ES).
Command-query responsibility segregation is an architectural pattern for user-facing applications that extends from the now standard Model-View-Controller (MVC) pattern and is an alternative to the CRUD pattern. At its core, CQRS is about changing how we think of and work with our data by introducing two types of models: all user actions become commands, and a read-only query model powers our views. Commands and queries are logistically separated, providing additional decoupling of our application. CQRS also calls for changes in how we store and structure our data.
Enter event sourcing. Instead of persisting the current state of our domain objects or entities, we record historical events about our data. The key advantage is that we can examine our application data at any point in time, rather than just the current state. This pattern changes how we persist and process our data but is surprisingly efficient.
While each of the two patterns can be used exclusively, they complement each other beautifully and facilitate the construction of decoupled, scalable applications or individual services. Stephen Pember explores the fundamentals of each pattern and offers several examples and demonstration code to show how one might actually go about implementing CQRS and ES. Steve discusses task-based UIs and domain-driven design as he outlines some of the advantages—and challenges—that ThirdChannel has seen when developing systems using CQRS and ES over the past year.
The presentation begins with an overview of the growth of non-structured data and the benefits NoSQL products provide. It then provides an evaluation of the more popular NoSQL products on the market including MongoDB, Cassandra, Neo4J, and Redis. With NoSQL architectures becoming an increasingly appealing database management option for many organizations, this presentation will help you effectively evaluate the most popular NoSQL offerings and determine which one best meets your business needs.
Summary of recent progress on Apache Drill, an open-source community-driven project to provide easy, dependable, fast and flexible ad hoc query capabilities.
Kafka as an event store - is it good enough?Guido Schmutz
Event Sourcing and CQRS are two popular patterns for implementing a Microservices architectures. With Event Sourcing we do not store the state of an object, but instead store all the events impacting its state. Then to retrieve an object state, we have to read the different events related to a certain object and apply them one by one. CQRS (Command Query Responsibility Segregation) on the other hand is a way to dissociate writes (Command) and reads (Query). Event Sourcing and CQRS are frequently grouped and used together to form something bigger. While it is possible to implement CQRS without Event Sourcing, the opposite is not necessarily correct. In order to implement Event Sourcing, an efficient Event Store is needed. But is that also true when combining Event Sourcing and CQRS? And what is an event store in the first place and what features should it implement?
This presentation will first discuss what functionalities an event store should offer and then present how Apache Kafka can be used to implement an event store. But is Kafka good enough or do specific event store solutions such as AxonDB or Event Store provide a better solution?
Cost-based query optimization in Apache HiveJulian Hyde
Tez is making Hive faster, and now cost-based optimization (CBO) is making it smarter. A new initiative in Hive 0.13 introduces cost-based optimization for the first time, based on the Optiq framework.
Optiq’s lead developer Julian Hyde shows the improvements that CBO is bringing to Hive 0.13. For those interested in Hive internals, he gives an overview of the Optiq framework and shows some of the improvements that are coming to future versions of Hive.
Apache Spark is a In Memory Data Processing Solution that can work with existing data source like HDFS and can make use of your existing computation infrastructure like YARN/Mesos etc. This talk will cover a basic introduction of Apache Spark with its various components like MLib, Shark, GrpahX and with few examples.
Relational databases vs Non-relational databasesJames Serra
There is a lot of confusion about the place and purpose of the many recent non-relational database solutions ("NoSQL databases") compared to the relational database solutions that have been around for so many years. In this presentation I will first clarify what exactly these database solutions are, compare them, and discuss the best use cases for each. I'll discuss topics involving OLTP, scaling, data warehousing, polyglot persistence, and the CAP theorem. We will even touch on a new type of database solution called NewSQL. If you are building a new solution it is important to understand all your options so you take the right path to success.
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016Taehoon Kim
발표 영상 : https://goo.gl/jrKrvf
데모 영상 : https://youtu.be/exXD6wJLJ6s
Deep Q-Network, Double Q-learning, Dueling Network 등의 기술을 소개하며, hyperparameter, debugging, ensemble 등의 엔지니어링으로 성능을 끌어 올린 과정을 공유합니다.
Join our experts Neeraja Rentachintala, Sr. Director of Product Management and Aman Sinha, Lead Software Engineer and host Sameer Nori in a discussion about putting Apache Drill into production.
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2L4rPmM
This CloudxLab Basics of RDD tutorial helps you to understand Basics of RDD in detail. Below are the topics covered in this tutorial:
1) What is RDD - Resilient Distributed Datasets
2) Creating RDD in Scala
3) RDD Operations - Transformations & Actions
4) RDD Transformations - map() & filter()
5) RDD Actions - take() & saveAsTextFile()
6) Lazy Evaluation & Instant Evaluation
7) Lineage Graph
8) flatMap and Union
9) Scala Transformations - Union
10) Scala Actions - saveAsTextFile(), collect(), take() and count()
11) More Actions - reduce()
12) Can We Use reduce() for Computing Average?
13) Solving Problems with Spark
14) Compute Average and Standard Deviation with Spark
15) Pick Random Samples From a Dataset using Spark
OSMC 2016 | ZMON: Zalando's OS approach to monitoring in the cloud and DCs by...NETWAYS
Ende 2014 entschieden wir uns, unsere seit 2010 gewachsene Nagios/Icinga-Infrastruktur abzulösen. Ziel war es eine flexible Lösung zu schaffen, die autonomen Teams die Überwachung ihrer heterogenen Systeme ermöglicht. Dies schließt zum Beispiel Micro Service Deployments, Datenbanken wie PostgreSQL und Cassandra sowie Prozess- und Business-Metriken aus anderen Datenquellen ein. Heute überwacht ZMON nicht nur unsere Plattform innerhalb der Rechenzentren, sondern ist auch verantwortlich für das Monitoring innerhalb der rasch wachsenden Anzahl Cloud Deployments. ZMON unterstützt die Teams dabei durch automatisierte Service Discovery auf AWS und das Wiederverwenden von Datenquellen und Alarmen über Teamgrenzen hinweg. Neben dem Alarm Dashboard integriert ZMON Grafana und stellt dem Benutzer historischen Werte aus KairosDB zur Verfügung, um einfach datengetriebene Dashboards zu erstellen.
Unser internes Deployment setzt zu 100% auf Open Source und greift auf dieselben Docker Images zurück, die wir auch in unseren Demos und Installationsskripten verwenden.
OSMC 2016 - ZMON Zalandos OS approach to monitoring in the cloud and DCs by J...NETWAYS
Nach seinem Abschluss in Informatik an der RWTH Aachen verstärkte zunächst Jan das Datenbank Team bei Zalando wo Themen rund um PostgreSQL Performance und Verfügbarkeit im Fokus standen. Nach anderen Themen mit Datenbankbezug im Plattform/Software Team stand im vergangen Jahr die ZMON Weiterentwicklung und Unterstützung für AWS an. Heute arbeitet Jan im Team Cloud Enablement und arbeitet dort eng mit anderen Teams zusammen rund um das Thema Cloud Migration.
Kafka as an event store - is it good enough?Guido Schmutz
Event Sourcing and CQRS are two popular patterns for implementing a Microservices architectures. With Event Sourcing we do not store the state of an object, but instead store all the events impacting its state. Then to retrieve an object state, we have to read the different events related to a certain object and apply them one by one. CQRS (Command Query Responsibility Segregation) on the other hand is a way to dissociate writes (Command) and reads (Query). Event Sourcing and CQRS are frequently grouped and used together to form something bigger. While it is possible to implement CQRS without Event Sourcing, the opposite is not necessarily correct. In order to implement Event Sourcing, an efficient Event Store is needed. But is that also true when combining Event Sourcing and CQRS? And what is an event store in the first place and what features should it implement?
This presentation will first discuss what functionalities an event store should offer and then present how Apache Kafka can be used to implement an event store. But is Kafka good enough or do specific event store solutions such as AxonDB or Event Store provide a better solution?
Cost-based query optimization in Apache HiveJulian Hyde
Tez is making Hive faster, and now cost-based optimization (CBO) is making it smarter. A new initiative in Hive 0.13 introduces cost-based optimization for the first time, based on the Optiq framework.
Optiq’s lead developer Julian Hyde shows the improvements that CBO is bringing to Hive 0.13. For those interested in Hive internals, he gives an overview of the Optiq framework and shows some of the improvements that are coming to future versions of Hive.
Apache Spark is a In Memory Data Processing Solution that can work with existing data source like HDFS and can make use of your existing computation infrastructure like YARN/Mesos etc. This talk will cover a basic introduction of Apache Spark with its various components like MLib, Shark, GrpahX and with few examples.
Relational databases vs Non-relational databasesJames Serra
There is a lot of confusion about the place and purpose of the many recent non-relational database solutions ("NoSQL databases") compared to the relational database solutions that have been around for so many years. In this presentation I will first clarify what exactly these database solutions are, compare them, and discuss the best use cases for each. I'll discuss topics involving OLTP, scaling, data warehousing, polyglot persistence, and the CAP theorem. We will even touch on a new type of database solution called NewSQL. If you are building a new solution it is important to understand all your options so you take the right path to success.
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016Taehoon Kim
발표 영상 : https://goo.gl/jrKrvf
데모 영상 : https://youtu.be/exXD6wJLJ6s
Deep Q-Network, Double Q-learning, Dueling Network 등의 기술을 소개하며, hyperparameter, debugging, ensemble 등의 엔지니어링으로 성능을 끌어 올린 과정을 공유합니다.
Join our experts Neeraja Rentachintala, Sr. Director of Product Management and Aman Sinha, Lead Software Engineer and host Sameer Nori in a discussion about putting Apache Drill into production.
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2L4rPmM
This CloudxLab Basics of RDD tutorial helps you to understand Basics of RDD in detail. Below are the topics covered in this tutorial:
1) What is RDD - Resilient Distributed Datasets
2) Creating RDD in Scala
3) RDD Operations - Transformations & Actions
4) RDD Transformations - map() & filter()
5) RDD Actions - take() & saveAsTextFile()
6) Lazy Evaluation & Instant Evaluation
7) Lineage Graph
8) flatMap and Union
9) Scala Transformations - Union
10) Scala Actions - saveAsTextFile(), collect(), take() and count()
11) More Actions - reduce()
12) Can We Use reduce() for Computing Average?
13) Solving Problems with Spark
14) Compute Average and Standard Deviation with Spark
15) Pick Random Samples From a Dataset using Spark
OSMC 2016 | ZMON: Zalando's OS approach to monitoring in the cloud and DCs by...NETWAYS
Ende 2014 entschieden wir uns, unsere seit 2010 gewachsene Nagios/Icinga-Infrastruktur abzulösen. Ziel war es eine flexible Lösung zu schaffen, die autonomen Teams die Überwachung ihrer heterogenen Systeme ermöglicht. Dies schließt zum Beispiel Micro Service Deployments, Datenbanken wie PostgreSQL und Cassandra sowie Prozess- und Business-Metriken aus anderen Datenquellen ein. Heute überwacht ZMON nicht nur unsere Plattform innerhalb der Rechenzentren, sondern ist auch verantwortlich für das Monitoring innerhalb der rasch wachsenden Anzahl Cloud Deployments. ZMON unterstützt die Teams dabei durch automatisierte Service Discovery auf AWS und das Wiederverwenden von Datenquellen und Alarmen über Teamgrenzen hinweg. Neben dem Alarm Dashboard integriert ZMON Grafana und stellt dem Benutzer historischen Werte aus KairosDB zur Verfügung, um einfach datengetriebene Dashboards zu erstellen.
Unser internes Deployment setzt zu 100% auf Open Source und greift auf dieselben Docker Images zurück, die wir auch in unseren Demos und Installationsskripten verwenden.
OSMC 2016 - ZMON Zalandos OS approach to monitoring in the cloud and DCs by J...NETWAYS
Nach seinem Abschluss in Informatik an der RWTH Aachen verstärkte zunächst Jan das Datenbank Team bei Zalando wo Themen rund um PostgreSQL Performance und Verfügbarkeit im Fokus standen. Nach anderen Themen mit Datenbankbezug im Plattform/Software Team stand im vergangen Jahr die ZMON Weiterentwicklung und Unterstützung für AWS an. Heute arbeitet Jan im Team Cloud Enablement und arbeitet dort eng mit anderen Teams zusammen rund um das Thema Cloud Migration.
Atmosphere 2016 - Jan Mussler - ZMON: Zalando's OS approach to monitoring in...PROIDEA
Two years ago we set out to build our own monitoring tool replacing Icinga. Our biggest focus was flexiblity and autonomy for the growing number of teams and engineers to enable them to monitor their services from small micro services to databases to higher level business KPIs. Today ZMON provides teams with the a federated monitoring solution that gathers data not only in our DCs but also in the connected AWS VPCs and assists teams with service auto discovery and sharing of checks/alerts to make everyone's life easier. ZMON comes along with Grafana2 and KairosDB enabling rich data driven dashboards.
As ZMON is an open source project and relies on some great products (kairosdb/redis) in the background we also provide some insights into how we build, ship, and deploy exactly the same docker images everyone can try for himself using gocd.
Getting all the 99.99(9) you always wanted Mite Mitreski
Any software running on the internet has some expectations of uptime. Many having millions of users can have high availability expectation. While running an application looks like a simple task, constantly evolving software, ever-changing clients and various request peaks can sure cause some headache. This talk will be focusing on the downsides and lessons learned from running and developing high uptime system and show you how you could also get the 99.999… uptime. We will also do some showing of concrete examples in various technologies like Docker, AWS and Java.
by Lin Chunyong and Ryan Deivert, Airbnb
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
Webinar: Securing your data - Mitigating the risks with MongoDBMongoDB
In this webinar, we walked through examples of the general security threats to databases. And we looked at how you can mitigate them for MongoDB deployments.
Bei Jimdo sammeln wir jede Menge Metriken über alle Teile unseres Systems. Dabei fallen Daten auf allen Ebenen des Systems an: Infrastruktur, System und Applikation. Wichtig ist, dass alle Entwickler zu jedem Zeitpunkt Einblick in die Echtzeit-Metriken ihrer Services nehmen können. Um das zu garantieren, haben wir uns einige Zeit mit der Integration von Prometheus in unsere Systeme beschäftigt.
In unserem Talk werden wir sowohl über den Betrieb von Prometheus als auch über die Integrationen mit dem Rest der Jimdo-Plattform sprechen. Wir werden von Stolpersteinen und Tricks berichten, die wir gelernt haben, sowie einen Einblick in unserer Tool-Landschaft geben.
Introduction to WSO2 Data Analytics PlatformSrinath Perera
WSO2 have had several analytics products: WSO2 BAM and WSO2 CEP for some time (or Big Data products if you prefer the term). We are added WSO2 Machine Learner, a product to create, evaluate, and deploy predictive models and renamed WSO2 BAM to WSO2 DAS ( Data Analytics Server).
The platform let you publish ( collect data) once and process them through batch ( Spark) , realtime ( CEP), search the data ( Lucene) and build machine learning models.
This post describes how all those fit within to a single story.
For more information, see https://iwringer.wordpress.com/2015/03/18/introducing-wso2-analytics-platform-note-for-architects/
NEW LAUNCH! Building Distributed Applications with AWS Step FunctionsAmazon Web Services
AWS Step Functions is a new, fully managed service that makes it easy to coordinate the components of distributed applications and microservices using visual workflows. Step Functions is a reliable way to coordinate components and step through the functions of your application. A graphical console helps you arrange and visualize the components of your application as a series of steps. Step Functions automatically triggers and tracks each step and retries when there are errors so that your application executes in order―and as expected―every time. This session shows how to use Step Functions to create, run, and debug multi-service applications in a matter of minutes. We also share how customers are using Step Functions to reliably build and scale multi-step applications such as order processing, report generation, and data transformation―and to innovate faster.
AWS IoT is a managed cloud platform that lets connected devices easily and securely interact with cloud applications and other devices. As an IoT developer, you will want to interact with AWS services like Kinesis, Lambda, and Amazon Machine Learning to get the most from your IoT application. In this session, we will do a deep dive on how to define rules in the Rules Engine, or retrieve the last known and desired state of device using Device Shadows, learn about the use cases and benefits of AWS Greengrass, and routing data from devices to AWS services to leverage the entire cloud for your Internet of Things application.
MongoDB Days UK: Securing Your Deployment with MongoDB EnterpriseMongoDB
Presented by Mat Keep, Principal Product Manager, MongoDB
Security is more critical than ever with new computing environments in the cloud and expanding access to the Internet. There are a number of security protection mechanisms available for MongoDB to ensure you have a stable and secure architecture for your deployment. We'll walk through general security threats to databases and specifically how they can be mitigated for MongoDB deployments. Topics will include:
- General security tools
- How to configure those for MongoDB
- Security features available in MongoDB such as LDAP, SSL, x.509, authentication, and encryption
Similar to ZMON: Monitoring Zalando's Engineering Platform (20)
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Zalando Technology
In this talk we present Zalando's microservices architecture, introduce Saiki – our next generation data integration and distribution platform on AWS and show how we employ stream processing for near-real time business intelligence.
Zalando is one of the largest online fashion retailers in Europe. In order to secure our future growth and remain competitive in this dynamic market, we are transitioning from a monolithic to a microservices architecture and from a hierarchical to an agile organization.
We first have a look at how business intelligence processes have been working inside Zalando for the last years and present our current approach - Saiki. It is a scalable, cloud-based data integration and distribution infrastructure that makes data from our many microservices readily available for analytical teams.
We no longer live in a world of static data sets, but are instead confronted with an endless stream of events that constantly inform us about relevant happenings from all over the enterprise. The processing of these event streams enables us to do near-real time business intelligence. In this context we have evaluated Apache Flink vs. Apache Spark in order to choose the right stream processing framework. Given our requirements, we decided to use Flink as part of our technology stack, alongside with Kafka and Elasticsearch.
With these technologies we are currently working on two use cases: a near real-time business process monitoring solution and streaming ETL.
Monitoring our business processes enables us to check if technically the Zalando platform works. It also helps us analyze data streams on the fly, e.g. order velocities, delivery velocities and to control service level agreements.
On the other hand, streaming ETL is used to relinquish resources from our relational data warehouse, as it struggles with increasingly high loads. In addition to that, it also reduces the latency and facilitates the platform scalability.
Finally, we have an outlook on our future use cases, e.g. near-real time sales and price monitoring. Another aspect to be addressed is to lower the entry barrier of stream processing for our colleagues coming from a relational database background.
How We Made our Tech Organization and Architecture Converge Towards ScalabilityZalando Technology
In this talk, Dan Persa shared the recipe we have applied to scale our tech team to more than 1,000 people, while redesigning the architecture of our Online Fashion Shop -- Project Mosaic -- to make more than 18 million customers happier. He also shared Zalando Tech’s learnings and takeaways in order to become both more successful and more customer centric.
Zalando's Jan Mußler points out how Docker helps to foster team autonomy while complying with company regulations. As an example he shows how Zalando Tech deploys onto its microservices infrastructure by using their open source platform STUPS (stups.io) along with Docker in their continuous delivery strategy.
https://tech.zalando.com
Berlin Apache Flink Meetup, May 2016
In this talk we present Zalando's microservices architecture and introduce Saiki – our next generation data integration and distribution platform on AWS. We show why we chose Apache Flink to serve as our stream processing framework and describe how we employ it for our current use cases: business process monitoring and continuous ETL. We then have an outlook on future use cases.
By Javier Lopez & Mihail Vieru, Zalando, Zalando SE
There are many ways to run high availability with PostgreSQL. Here, we present a template for you to create your own customized, high-availability solution using Python and for maximum accessibility, a distributed configuration store like ZooKeeper or etcd.
Reactive Design Patterns: a talk by Typesafe's Dr. Roland KuhnZalando Technology
We had the great pleasure of hosting a talk by Dr. Roland Kuhn: leader of Typesafe’s Akka project, and coauthor of the book Reactive Design Patterns and the Reactive Manifesto. For a standing-room-only crowd, Roland highlighted the importance of making reactive software: of considering responsiveness, maintainability, elasticity and scalability from the outset of development. He explored several architecture elements that are commonly found in reactive systems, such as the circuit breaker, various replication techniques, and flow control protocols. These patterns are language-agnostic and also independent of the abundant choice of reactive programming frameworks and libraries. Check out his slides!
At the Dublin Fashion Insights Centre, we are exploring methods of categorising the web into a set of known fashion related topics. This raises questions such as: How many fashion related topics are there? How closely are they related to each other, or to other non-fashion topics? Furthermore, what topic hierarchies exist in this landscape? Using Clojure and MLlib to harness the data available from crowd-sourced websites such as DMOZ (a categorisation of millions of websites) and Common Crawl (a monthly crawl of billions of websites), we are answering these questions to understand fashion in a quantitative manner.
The latest generation of big data tools such as Apache Spark routinely handle petabytes of data while also addressing real-world realities like node and network failures. Spark's transformations and operations on data sets are a natural fit with Clojure's everyday use of transformations and reductions. Spark MLlib's excellent implementations of distributed machine learning algorithms puts the power of large-scale analytics in the hands of Clojure developers. At Zalando's Dublin Fashion Insights Centre, we're using the Clojure bindings to Spark and MLlib to answer fashion-related questions that until recently have been nearly impossible to answer quantitatively.
Hunter Kelly @retnuh
tech.zalando.com
Building a Reactive RESTful API with Akka Http & SlickZalando Technology
Dan Persa, Senior Software Engineer at Zalando
Dan Persa has been a software engineer at Zalando since 2013 and is a member of the Fashion Store team, which is responsible for Zalando’s core ecommerce business. He loves Java and Scala and more recently has been exploring Go and Node.js. He’s a big fan of Clean Code and Software Craftsmanship. In addition to coding, he enjoys mentoring new developers, organizing coder dojos and reading groups and giving tech talks. In his free time he likes to take photos and dance salsa.
tech.zalando.com
This July we teamed up with TestObject to host a Mobile Quality Crew Meetup at the Zalando Technology building in Berlin.
Presentations included:
Mobile Testing Challenges @ZalandoTech
Dmitry Bespalov, iOS Engineer
and Hendrik Seffler, Delivery Lead Engineering Productivity at Zalando
Efforts of automation @6Wunderkinder
Justin Ison is a QA Lead at 6Wunderkinder
http://www.meetup.com/Zalando-Tech-Events-Berlin/
Auto-scaling your API: Insights and Tips from the Zalando TeamZalando Technology
A presentation given by Zalando software engineers Sean Floyd and Luis Mineiro at the JBCNConf in Barcelona, Spain, June 2015. Zalando is Europe's largest online fashion platform, doing business in 15 countries with more than 15 million users. Visit tech.zalando.com for more information about Zalando's technology, open source projects and opportunities.
Radical Agility with Autonomous Teams and Microservices in the CloudZalando Technology
A talk by software engineers Jan Löffler and Henning Jacobs on Zalando's adoption of microservices, cloud computing and autonomous teams. Zalando is Europe's largest online fashion platform, doing business in 15 countries with more than 15 million users. Visit tech.zalando.com for more information about Zalando's technology, open source projects and opportunities.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
2. 15 countries
3 fulfillment centers
16+ million active customers
2.2+ billion € revenue 2014
130+ million visits per month
8.000+ employees
ONE of EUROPE’S LARGEST ONLINE FASHION RETAILERS
Visit us: tech.zalando.com
5. Monitoring Situation Until Late 2013
ICINGA plus custom frontend (ZMON 1)
Did not scale with growth:
● Our UI became too slow
● Number of systems to check too many
● Number of teams that wanted checks grew
● Every request had to go through single team
6. Improve performance and throughput
Autonomy for individual teams
Flexibility and extendability
Integration into tooling (CMDB, DeployCtl …)
Goals of new ZMON development
7. Entity:
Anything you may want to monitor
Can be used as a "dimension"
Checks:
Runnable Python snippet fetching data
Alert on Check:
Python expression yielding true or false
The basic terminology ...
8. Zalando Tech - 24x7 team setup
Incident Team
Incident Team
Alerts
Database
Team
Alerts
2nd Level
Database
Calls if help needed
Inheritance with
custom thresholds
SMS / E-Mail
observes
16. Integrated easy-to-use entity store with REST API
>zmon entities push local-postgres.yaml
Entity Service
id: localhost:5432
type: postgres
host: localhost
port: 5432
shards:
local_zmon_db: "localhost:5432/local_zmon_db"
local-postgres.yaml
17. ● select subset of entities
● executes Python expression
○ powerful using eval with custom context
○ Builtins: HTTP, PostgreSQL, MySQL, Cloudwatch,
Redis, SNMP, tcp, SOAP, Scalyr...
● returns "value" object
○ Quickly, every check returned "dicts"
Checks
18. REST API to update / auto-import from SCM
zmon check-definitions update select-1-check.yaml
Managing checks
name: "Select 1"
owning_team: "Team 1"
command: |
sql().execute("select 1 as a").results()
entities:
- type: postgres
interval: 15
description: "test connection"
select-1-check.yaml
19.
20. ● Executes using a check’s value, bound to single check
● Defines team and responsible team
● Allows inheritance from other alert
● Evaluates Python expression yielding True/False
● No "WARNING" state, no "UNKNOWN" state
● Priorities and tags
Alerts
24. Anyone can add alerts to checks
Alerts are owned by team
Monitor application boundaries/dependencies
Make use of inheritance to customize
Sharing and reuse of alerts and checks
40. ZMON in AWS Setup
*.foo.example.org *.bar.example.org
Team "Foo" Team "Bar"
EC2
Instance
EC2
InstanceEC2
Instance
EC2
Instance
ZMON
Appliance
ZMON
Appliance
KairosDB
EC2
Instance
EC2
Instance
ZMON
Data Service
ELB ELB
41. ● Scheduler supports queue filters by entity
○ e.g. {"dc":"dc1"} vs {"dc":"dc2"} queue filters
● Scheduler can apply base filter
○ only handles entities with {"dc":"dc1"}
● Worker can report home using:
○ Redis (we use this across DCs)
○ HTTPS (AWS->DC)
Multi DC / Zone deployment possible
42. Uses Amazon API to fetch:
● ELBs
● EC2 instances
● RDS instances
Pushes enriched entities to entity service
ZMON AWS Agent
44. Kubernetes Example: Exports in Prometheus text format
kubelet_docker_operations_latency_microseconds{operation_type="inspect_container",quantile="0.9"} 9602
kubelet_docker_operations_latency_microseconds{operation_type="list_containers",quantile="0.9"} 9740