Nachdem sich Apache Spark 2015 als ernsthafte Alternative unter den Big Data Frameworks etablieren konnte und Hadoop MapReduce den Rang abläuft, kommt nun aus Berlin unerwartet Konkurrenz in Form von Apache Flink.
Video zu den Slides:
https://www.youtube.com/watch?v=-MmX44pjJ9s&list=PL6ceXNIVUaAKIxQO_aBLlWpp48x-cRzOE&index=2
Zur Spark & Hadoop User Group:
http://www.meetup.com/Hadoop-User-Group-Munich/
자프링(자바 + 스프링) 외길 12년차 서버 개발자가 코프링(코틀린 + 스프링)을 만난 후 코틀린의 특징과 스프링의 코틀린 지원을 알아가며 코프링 월드에서 살아남은 이야기…
코드 저장소: https://github.com/arawn/kotlin-support-in-spring
As your data grows, the need to establish proper indexes becomes critical to performance. MongoDB supports a wide range of indexing options to enable fast querying of your data, but what are the right strategies for your application?
In this talk we’ll cover how indexing works, the various indexing options, and use cases where each can be useful. We'll dive into common pitfalls using real-world examples to ensure that you're ready for scale.
자프링(자바 + 스프링) 외길 12년차 서버 개발자가 코프링(코틀린 + 스프링)을 만난 후 코틀린의 특징과 스프링의 코틀린 지원을 알아가며 코프링 월드에서 살아남은 이야기…
코드 저장소: https://github.com/arawn/kotlin-support-in-spring
As your data grows, the need to establish proper indexes becomes critical to performance. MongoDB supports a wide range of indexing options to enable fast querying of your data, but what are the right strategies for your application?
In this talk we’ll cover how indexing works, the various indexing options, and use cases where each can be useful. We'll dive into common pitfalls using real-world examples to ensure that you're ready for scale.
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkFlink Forward
http://flink-forward.org/kb_sessions/joining-infinity-windowless-stream-processing-with-flink/
The extensive set of high-level Flink primitives makes it easy to join windowed streams. However, use cases that don’t have windows can prove to be more complicated, making it necessary to leverage operator state and low-level primitives to manually implement a continuous join. This talk will focus on the anomalies that present themselves when performing streaming joins with infinite windows, and the problems encountered operating topologies that back user-facing data. We will describe the approach taken at ResearchGate to implement and maintain a consistent join result of change data capture streams.
Wolltest du schon immer die Vorteile und Ideen von Scala in deinen Java oder Kotlin Projekten nutzen? Dann ist Vavr (ehemals Javaslang) genau die richtige Bibliothek für dich.
Anhand echter Projektbeispiele schauen wir uns den Nutzen an, den Vavr mit seinen syntaktischen Erweiterungen und Features bei der täglichen Arbeit bietet. Wir schauen uns Value Types, echte funktionale Datentypen an und werden lernen, wie wir Exceptions sinnvoller behandeln können. Alles für besser wartbaren und sauberen Code!
Vavr bietet die Möglichkeit, die Vorteile objekt-funktionaler Programmierung zu nutzen, ohne Java den Rücken kehren zu müssen.
A comparison of different solutions for full-text search in web applications using PostgreSQL and other technology. Presented at the PostgreSQL Conference West, in Seattle, October 2009.
After a short introduction to the Java driver for MongoDB, we'll have a look at the more abtract persistence frameworks like Morphia, Spring Data, Jongo and Hibernate OGM.
Java Persistence Frameworks for MongoDBTobias Trelle
After a short introduction to the MongoDB Java driver we'll have a detailed look at higher level persistence frameworks like Morphia, Spring Data MongoDB and Hibernate OGM with lots of examples.
Presented on 10/11/12 at the Boston Elasticsearch meetup held at the Microsoft New England Research & Development Center. This talk gave a very high-level overview of Elasticsearch to newcomers and explained why ES is a good fit for Traackr's use case.
Video available here: http://vivu.tv/portal/archive.jsp?flow=783-586-4282&id=1270584002677
We all know that MongoDB is one of the most flexible and feature-rich databases available. In this webinar we'll discuss how you can leverage this feature set and maintain high performance with your project's massive data sets and high loads. We'll cover how indexes can be designed to optimize the performance of MongoDB. We'll also discuss tips for diagnosing and fixing performance issues should they arise.
DevFest Istanbul - a free guided tour of Neo4JFlorent Biville
2013-11-02 : DevFest Türkiye, Istanbul.
Slightly modified version of my previous Neo4J introduction talk about Neo4J in Soft-Shake Event, Geneva, Switzerland.
Distributed Computing and Caching in the Cloud: Hazelcast and MicrosoftComsysto Reply GmbH
Cloud, Docker and Microservices - these are not just words in the software industry, those words mark the biggest paradigm shift of how software is developed for the last century. Dynamic and automatic scalability, instant fail-over and the easiest deployment solution ever possible. Hazelcast and Microsoft have worked together to bring the power of In-Memory Computing and Caching to the Azure cloud. See Christoph Engelbert from Hazelcast and Vito Flavio Lorusso from Microsoft presenting how easy it is to use Hazelcast, the fastest and easiest way to run it on Azure and how to run high-end performing applications at global scale leveraging Azure global infrastructure.
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkFlink Forward
http://flink-forward.org/kb_sessions/joining-infinity-windowless-stream-processing-with-flink/
The extensive set of high-level Flink primitives makes it easy to join windowed streams. However, use cases that don’t have windows can prove to be more complicated, making it necessary to leverage operator state and low-level primitives to manually implement a continuous join. This talk will focus on the anomalies that present themselves when performing streaming joins with infinite windows, and the problems encountered operating topologies that back user-facing data. We will describe the approach taken at ResearchGate to implement and maintain a consistent join result of change data capture streams.
Wolltest du schon immer die Vorteile und Ideen von Scala in deinen Java oder Kotlin Projekten nutzen? Dann ist Vavr (ehemals Javaslang) genau die richtige Bibliothek für dich.
Anhand echter Projektbeispiele schauen wir uns den Nutzen an, den Vavr mit seinen syntaktischen Erweiterungen und Features bei der täglichen Arbeit bietet. Wir schauen uns Value Types, echte funktionale Datentypen an und werden lernen, wie wir Exceptions sinnvoller behandeln können. Alles für besser wartbaren und sauberen Code!
Vavr bietet die Möglichkeit, die Vorteile objekt-funktionaler Programmierung zu nutzen, ohne Java den Rücken kehren zu müssen.
A comparison of different solutions for full-text search in web applications using PostgreSQL and other technology. Presented at the PostgreSQL Conference West, in Seattle, October 2009.
After a short introduction to the Java driver for MongoDB, we'll have a look at the more abtract persistence frameworks like Morphia, Spring Data, Jongo and Hibernate OGM.
Java Persistence Frameworks for MongoDBTobias Trelle
After a short introduction to the MongoDB Java driver we'll have a detailed look at higher level persistence frameworks like Morphia, Spring Data MongoDB and Hibernate OGM with lots of examples.
Presented on 10/11/12 at the Boston Elasticsearch meetup held at the Microsoft New England Research & Development Center. This talk gave a very high-level overview of Elasticsearch to newcomers and explained why ES is a good fit for Traackr's use case.
Video available here: http://vivu.tv/portal/archive.jsp?flow=783-586-4282&id=1270584002677
We all know that MongoDB is one of the most flexible and feature-rich databases available. In this webinar we'll discuss how you can leverage this feature set and maintain high performance with your project's massive data sets and high loads. We'll cover how indexes can be designed to optimize the performance of MongoDB. We'll also discuss tips for diagnosing and fixing performance issues should they arise.
DevFest Istanbul - a free guided tour of Neo4JFlorent Biville
2013-11-02 : DevFest Türkiye, Istanbul.
Slightly modified version of my previous Neo4J introduction talk about Neo4J in Soft-Shake Event, Geneva, Switzerland.
Distributed Computing and Caching in the Cloud: Hazelcast and MicrosoftComsysto Reply GmbH
Cloud, Docker and Microservices - these are not just words in the software industry, those words mark the biggest paradigm shift of how software is developed for the last century. Dynamic and automatic scalability, instant fail-over and the easiest deployment solution ever possible. Hazelcast and Microsoft have worked together to bring the power of In-Memory Computing and Caching to the Azure cloud. See Christoph Engelbert from Hazelcast and Vito Flavio Lorusso from Microsoft presenting how easy it is to use Hazelcast, the fastest and easiest way to run it on Azure and how to run high-end performing applications at global scale leveraging Azure global infrastructure.
Grundlegende Konzepte von Elm, React und AngularDart 2 im VergleichComsysto Reply GmbH
Grundlegende Konzepte von Elm, React und AngularDart 2 im Vergleich mit Pizza und Bier.
Ein Prototyp, drei Implementierungen! Wo liegen Unterschiede und Gemeinsamkeiten? Erfahrungen und Meinungen aller Teilnehmer sind ausdrücklich erwünscht!
Speaker Bio:
Mohammed El Batya
Begeisterter Java/Spring-, Android-, Web-Entwickler ... also quasi ein "Full-Stack-Entwickler". Entwickler von PendelPanda für Android. Entdecken und Ausprobieren von neuen Technologien, Programmiersprachen und Frameworks ist mein Hobby!
Are you interested into getting deep insight into the new features that Project Jigsaw offers in Java 9 ?
Project Jigsaw is one of the biggest changes introduced in Java since the launch of the Java programming language back in 1995. It has a great impact on the way we architect and develop Java applications.
Project Jigsaw represents a brand new modular system that brings lots of features and empowers developers to build modular applications using Java 9.
In this presentation you will see how the entire JDK was divided into modules and how the source code was reorganized around them.
You will learn all what you need to know in order to start developing reliable, secure and maintainable modular Java applications with Project Jigsaw.
You will see how to define modules and how to compile, package and run a Java application using Jigsaw.
You’ll learn how to take advantage of the new module path and how to create modular run-time images that represent smaller and compacter JREs that consist only of the modules you need.
Having a Java 7 or 8 application and you are intending to migrate it to Java 9? In this talk you’ll learn how to do it using top-down migration and bottom-up migration.
Are you afraid that your application code will break when switching to Java 9? No problem, you’ll see what you should do in order to make your application suitable for Java 9.
Continuous Processing with Apache Flink - Strata London 2016Stephan Ewen
Task from the Strata & Hadoop World conference in London, 2016: Apache Flink and Continuous Processing.
The talk discusses some of the shortcomings of building continuous applications via batch processing, and how a stream processing architecture naturally solves many of these issues.
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUponCloudera, Inc.
OpenTSDB was built on the belief that, through HBase, a new breed of monitoring systems could be created, one that can store and serve billions of data points forever without the need for destructive downsampling, one that could scale to millions of metrics, and where plotting real-time graphs is easy and fast. In this presentation we’ll review some of the key points of OpenTSDB’s design, some of the mistakes that were made, how they were or will be addressed, and what were some of the lessons learned while writing and running OpenTSDB as well as asynchbase, the asynchronous high-performance thread-safe client for HBase. Specific topics discussed will be around the schema, how it impacts performance and allows concurrent writes without need for coordination in a distributed cluster of OpenTSDB instances.
A Tale of Two APIs: Using Spark Streaming In ProductionLightbend
Fast Data architectures are the answer to the increasing need for the enterprise to process and analyze continuous streams of data to accelerate decision making and become reactive to the particular characteristics of their market.
Apache Spark is a popular framework for data analytics. Its capabilities include SQL-based analytics, dataflow processing, graph analytics and a rich library of built-in machine learning algorithms. These libraries can be combined to address a wide range of requirements for large-scale data analytics.
To address Fast Data flows, Spark offers two API's: The mature Spark Streaming and its younger sibling, Structured Streaming. In this talk, we are going to introduce both APIs. Using practical examples, you will get a taste of each one and obtain guidance on how to choose the right one for your application.
Http4s, Doobie and Circe: The Functional Web StackGaryCoady
Http4s, Doobie and Circe together form a nice platform for building web services. This presentations provides an introduction to using them to build your own service.
Last year, in Apache Spark 2.0, Databricks introduced Structured Streaming, a new stream processing engine built on Spark SQL, which revolutionized how developers could write stream processing application. Structured Streaming enables users to express their computations the same way they would express a batch query on static data. Developers can express queries using powerful high-level APIs including DataFrames, Dataset and SQL. Then, the Spark SQL engine is capable of converting these batch-like transformations into an incremental execution plan that can process streaming data, while automatically handling late, out-of-order data and ensuring end-to-end exactly-once fault-tolerance guarantees.
Since Spark 2.0, Databricks has been hard at work building first-class integration with Kafka. With this new connectivity, performing complex, low-latency analytics is now as easy as writing a standard SQL query. This functionality, in addition to the existing connectivity of Spark SQL, makes it easy to analyze data using one unified framework. Users can now seamlessly extract insights from data, independent of whether it is coming from messy / unstructured files, a structured / columnar historical data warehouse, or arriving in real-time from Kafka/Kinesis.
In this session, Das will walk through a concrete example where – in less than 10 lines – you read Kafka, parse JSON payload data into separate columns, transform it, enrich it by joining with static data and write it out as a table ready for batch and ad-hoc queries on up-to-the-last-minute data. He’ll use techniques including event-time based aggregations, arbitrary stateful operations, and automatic state management using event-time watermarks.
Lets look at writing a new Struts 2 application from square one, using the Yahoo User Interface (YUI) Library on the front end, and Struts 2 on the backend. YUI provides the glitz and the glamour, and Struts 2 provides the dreary business logic, input validation, and text formatting.
Recent developments in Hadoop version 2 are pushing the system from the traditional, batch oriented, computational model based on MapRecuce towards becoming a multi paradigm, general purpose, platform. In the first part of this talk we will review and contrast three popular processing frameworks. In the second part we will look at how the ecosystem (eg. Hive, Mahout, Spark) is making use of these new advancements. Finally, we will illustrate "use cases" of batch, interactive and streaming architectures to power traditional and "advanced" analytics applications.
Spark schema for free with David SzakallasDatabricks
DataFrames are essential for high-performance code, but sadly lag behind in development experience in Scala. When we started migrating our existing Spark application from RDDs to DataFrames at Whitepages, we had to scratch our heads real hard to come up with a good solution. DataFrames come at a loss of compile-time type safety and there is limited support for encoding JVM types.
We wanted more descriptive types without the overhead of Dataset operations. The data binding API should be extendable. Schema for input files should be generated from classes when we don’t want inference. UDFs should be more type-safe. Spark does not provide these natively, but with the help of shapeless and type-level programming we found a solution to nearly all of our wishes. We migrated the RDD code without any of the following: changing our domain entities, writing schema description or breaking binary compatibility with our existing formats. Instead we derived schema, data binding and UDFs, and tried to sacrifice the least amount of type safety while still enjoying the performance of DataFrames.
Spark Schema For Free with David SzakallasDatabricks
DataFrames are essential for high-performance code, but sadly lag behind in development experience in Scala. When we started migrating our existing Spark application from RDDs to DataFrames at Whitepages, we had to scratch our heads real hard to come up with a good solution. DataFrames come at a loss of compile-time type safety and there is limited support for encoding JVM types.
We wanted more descriptive types without the overhead of Dataset operations. The data binding API should be extendable. Schema for input files should be generated from classes when we don’t want inference. UDFs should be more type-safe. Spark does not provide these natively, but with the help of shapeless and type-level programming we found a solution to nearly all of our wishes. We migrated the RDD code without any of the following: changing our domain entities, writing schema description or breaking binary compatibility with our existing formats. Instead we derived schema, data binding and UDFs, and tried to sacrifice the least amount of type safety while still enjoying the performance of DataFrames.
Ajax is the web's hottest user interface. Struts is Java's most popular web framework. What happens when we put Ajax on Struts?
In this session, we look at writing a new Struts 2 application from square one, using the Yahoo User Interface (YUI) Library on the front end, and Struts 2 on the backend. YUI provides the glitz and the glamour, and Struts 2 provides the dreary business logic, input validation, and text formatting.
During the session, we will cover
* How to integrate an Ajax UI with Struts 2
* Basics of the Yahoo User Interface (YUI) Library
* Business services Struts can provide to an Ajax UI
Who should attend: Ajax developers who would like to utilize Struts as a back-end, and Struts developers who would like to utilize Ajax as a front-end.
To get the most from this session, some familiarity with an Ajax library, like YUI or Dojo, is helpful.
Spark Streaming Programming Techniques You Should Know with Gerard MaasSpark Summit
At its heart, Spark Streaming is a scheduling framework, able to efficiently collect and deliver data to Spark for further processing. While the DStream abstraction provides high-level functions to process streams, several operations also grant us access to deeper levels of the API, where we can directly operate on RDDs, transform them to Datasets to make use of that abstraction or store the data for later processing. Between these API layers lie many hooks that we can manipulate to enrich our Spark Streaming jobs. In this presentation we will demonstrate how to tap into the Spark Streaming scheduler to run arbitrary data workloads, we will show practical uses of the forgotten ‘ConstantInputDStream’ and will explain how to combine Spark Streaming with probabilistic data structures to optimize the use of memory in order to improve the resource usage of long-running streaming jobs. Attendees of this session will come out with a richer toolbox of techniques to widen the use of Spark Streaming and improve the robustness of new or existing jobs.
4Developers 2018: Pyt(h)on vs słoń: aktualny stan przetwarzania dużych danych...PROIDEA
Historycznie świat dużych danych, lub jak kto woli Big Data, był zarezerwowany dla technologii pochodzących ze świata Javy. Z drugiej strony, od lat Python silnie się rozwija w analizie danych i obliczeniach naukowych, które z reguły działają na mniejszych danych. Niemniej, wiele się obecnie zmieniło. Python stał się coraz ważniejszym językiem w projekcie Spark. Ponadto nowe projekty w Python do pracy z dużymi danymi, jak Dask, stają się coraz bardziej popularne. Dodatkowo, coraz więcej zarządzanych platform chmurowych jak Google BigQuery jest powszechnie dostępnych i łatwo używalnych w Python. W tej prezentacji podsumowuje aktualny stan analizy dużych danych w Python, poparty prawdziwymi przykładami, zaletami i wadami danych podejść, oraz przemyśleniami co może przynieść przyszłość.
Midway in our life's journey, I went astray from the straight imperative road and woke to find myself alone in a dark declarative wood.
My guide out of this dark declarative wood was a familiar friend, SQL, who showed me the way to wrap a context of a window to push through using Window Functions to escape the Inferno.
Next I found myself somewhere in-between running up hill with one foot in front of the other advancing so as the leading foot was always above the ground running with my friend LINQ, I was able to wrap the context of a collection around my data to advance my journey through Purgatorio.
My last guide into the blinding brilliant light of Paradiso was from the Dutch Caribbean, who taught me how to wrap my computations into a context and move my data through leading me into brilliant bliss.
Join me on my divine data comedy.
Similar to 21.04.2016 Meetup: Spark vs. Flink (20)
Abstract
The idea of this talk is to help development teams to make correct architectural decisions.
Andrei will highlight the basic architectural principles and show ways to achieve architecture that is good enough to cover the project requirements and evolve in the future.
He will also present several cases from real projects, where wrong, missing, or over-sophisticated architecture decisions really hurt the development teams:
- Painful sharing: do shared modules increase reusability or will be the source of problems?
- Microservices are the solution to every problem!
- Non-extensible extensibility: too sophisticated configuration hurts
- Over fine-grained: incorrect splitting to Microservices can make life even harder as with monolith
- Convey horizontal split: how organizational driven split can jeopardise the architecture
- Model-driven: central responsibility blocks and limits the team
- Cargo cult: blindly following patterns and rule can produce an unmaintainable system
- Freestyle architecture: what happens if teams completely ignore architecture
- Improve with less intelligence: smart endpoint and dumb pipes
Hexagonale Architektur ist seit einigen Jahren ein beliebtes Thema in der Software Engineering Community. Nicht zuletzt durch den Microservice Trend der letzten Jahre ist es ein verbreiteter Architekturstil, der helfen kann Services zu strukturieren. Bei der Umsetzung kommt dabei oft die Frage auf, wie die Vorteile: Entkopplung der Geschäftslogik, Testbarkeit, Erweiterbarkeit und klare Anwendungsstruktur erreicht werden können.
Video zum Talk ist unter folgendem Link auffindbar:
https://youtu.be/sgAXtNv7LjM
In diesem Talk stellen Otto und Sven ihre Erfahrungen im Einsatz von Hexagonaler Architektur aus der Praxis vor und gehen auf die essentiellen Bestandteile ein und wie diese auf die verschiedenen Architekturziele einzahlen.
Der Talk wendet sich an Entwickler und Architekten, die Benefits und Tradeoffs in der Anwendung Hexagonaler Architektur besser verstehen wollen.
Software Architecture and Architectors: useless VS valuableComsysto Reply GmbH
Abstract:
This talk introduces definitions of system architecture and proposes a way to achieve "good enough" architecture covers project requirements
Andrei will show several cases from real projects, where wrong, missing or over-sophisticated architecture decisions really hurt the development teams:
Painful sharing: do shared modules increase reusability or will be the source of problems?
Non-extensible extensibility: too sophisticated configuration hurts
Over fine-grained: incorrect splitting to microservices can make life even harder as with monolith
Cargo cult: blindly following patterns and rules can produce an unmaintainable system
Freestyle architecture: what happens if teams completely ignore architecture
Improve with less intelligence: smart endpoint and dumb pipes
We are looking forward to meet many of you in person and have great discussions around this topic!
https://www.meetup.com/de-DE/meetup-group-tfyvuydp/
Abstract:
Data Visualization describes the process of transferring data into images on the computer. Only using images, humans are able to get insights into large amounts of data - which makes data visualization a very important part of data analytics. However, there are a lot of techniques and tools available - and which ones are suited best for which tasks? Also, the process of creating a data visualization, selecting the right visual mappings, colors, chart types, etc, is very complex and can also confuse viewers, if not done properly. In this talk, I will reflect on current developments in data visualization research and how data visualization can be used in a data science workflow.
Speaker Bio:
Dr. Johanna Schmidt is head of the research unit “Visual Analytics” at VRVis Zentrum für Virtual Reality und Visualisierung Forschungs-GmbH in Vienna. She received her Master’s degree in Computer Science and afterward continued with a Ph.D. in data visualization at TU Vienna, Austria. Her current research focuses on the visual analysis of large datasets, mainly manufacturing data originating from industry companies and time series data. Additionally, she is a lecturer at the TU Vienna and at the FH Salzburg.
https://www.meetup.com/de-DE/predictive-analytics-for-industry-4-0/
Microservices bringen zahlreiche Vorteile für die Backend-Entwicklung mit sich. Könnte man die Vorteile nicht auch im Frontend nutzen? Dazu prägt sich aktuell die Idee des MicroFrontend, für deren Umsetzung beispielsweise WebComponents eingesetzt werden können. Im Vortrag wird beispielhaft eine Web-Komponente auf Basis von Standard-APIs erstellt und daran gezeigt, was hier bereits mit jedem aktuellen Browser möglich ist. Ebenso bietet das allseits beliebte Angular Framework mit Angular-Elements eine Möglichkeit für die Implementierung von WebComponents.
In einer Demo werden diese beiden Welten zusammengeführt, und wir schauen uns Alternativen an, die von anderen Frameworks wie beispielsweise React oder Polymer angeboten werden. Außerdem wird aus mehr als einem Jahr Praxis bei der Umsetzung mit MicroFrontends berichtet.
Speaker:
Thomas Bröll arbeitet als Principal Consultant für Trivadis am Standort Stuttgart. Er ist seit mehr als 2 Jahrzenten als Software Entwickler, Berater und Architekt im Bereich Java. Sein Fokus ist dabei die Konzeption und Implementierung von Web-Applikationen und Business-Anwendungen auf Basis von Java-, Web- und Cloud-Technologien.
Er ist darüber hinaus für die Trivadis als Referent und Trainer im Bereich Java und Java Enterprise tätig.
Marius Hilleke hat bei der Trivadis erfolgreich das Java-Trainee-Programm absolviert und ist derzeit als Berater und Entwickler tätig. Dabei beschäftigt er sich mit unterschiedlichen Technologien für Cloud-basierte Anwendungen, wie beispielsweise dem Spring- und dem Angular-Framework. Darüber hinaus ist er besonders an Cloud-Technologien, wie Docker und Kubernetes interessiert
Wo ist die Grenze von offenen und schützenswerten Elementen bei Daten und API´s für eine KI? Wo beginnt innerhalb der Firma AI und welche neuen Geschäftsfelder können Sie bedienen? Herausforderungen welchen man bereits zu Beginn auf er „Road to AI“ gegenübersteht. Wir möchten durch Einblicke aus realisierten Projekten Ihr Bewusstsein für die Zielsetzung von AI schärfen. Mit Markttrends die Treiber der vielen Neuentwicklungen benennen und ein Eco-System skizzieren auf welches wir setzten werden.
Lernen Sie aus den Erfahrungen welche die Telekom in den letzten beiden Jahren durch Ihre Partner gemacht hat. Und wie eine deutsche Open Source basierte Cloud als KI-Plattform im Kontext von Offenheit und Eco-Systemen weiterentwickelt wird
Bable on Smart City Munich Meetup: How cities are leveraging innovative partn...Comsysto Reply GmbH
According to the topic of the Smart City Munich Meetup "10 years experiences in Smart City projects - Lessons learned" Shannon from Bable showed us insights into real life projects and opportunities for partnerships between cities and companies.
You want to join the Smart City Munich Community? Follow us here: https://www.meetup.com/de-DE/Smart-City-Munich
Data Reliability Challenges with Spark by Henning Kropp (Spark & Hadoop User ...Comsysto Reply GmbH
Current Data Lake projects are facing enormous issues over generating business value. According to Gartner, more than 65% of the projects are failing. The most common reasons for projects to fail are centered around data reliability and performance issues resulting in delays, complexity, and errors.
Delta is the next-generation analytics engine as part of the Databricks Runtime tackling some of the most challenging issues with Spark today. Delta provides ACID, Data Versioning, and Schema Enforcement on top of Apache Parquet. In this talk, we will discuss the current challenges and give a live demo of Delta.
"Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wo...Comsysto Reply GmbH
Looking at the IT landscape of big and medium-sized companies, Hadoop Data Lakes are no rarity anymore. Classical Data Warehouses stay on the map as well. So we usually have a hybrid landscape, historically grown and more or less loosely coupled. To gain value from this setup, it requires a holistic and use case oriented approach. This session presents a best-practice architecture. We will illustrate the strengths and shortcomings of its components. On the basis of a real project example we will discuss which challenge can be tackled best by which part.
Kolja:
Kolja works with Woodmark Consulting (based in Munich) on solving customers' data challenges. In consulting projects he typically designs architectures and frameworks for data integration. Currently Kolja focusses on aspects of Hybrid Architectures. He studies how established components from classical Data Warehouses and those from modern Hadoop environments can be smartly combined. Kolja holds a M.Sc. in Computer Science from the TU Munich with focus on databases and information systems.
"Hybrid Architectures, Data Lakes + Data Warehouse"
The big data discussion continues and the practice shows that Data Lakes do not replace but complement Data Warehouse. Which new scenarios are possible? What are the strengths of hybrid architectures, ie the combination of Data Lakes and Data Warehouses?
Many people promise fast data as the next step after big data. The idea of creating a complete end-to-end data pipeline that combines Spark, Akka, Cassandra, Kafka, and Apache Mesos came up two years ago, sometimes called the SMACK stack. The SMACK stack is an ideal environment for handling all sorts of data-processing needs which can be nightly batch-processing tasks, real-time ingestion of sensor data or business intelligence questions. The SMACK stack includes a lot of components which have to be deployed somewhere. Let’s see how we can create a distributed environment in the cloud with Terraform and how we can provision a Mesos-Cluster with Mesosphere Datacenter Operating System (DC/OS) to create a powerful fast data platform.
Apache Apex: Stream Processing Architecture and Applications Comsysto Reply GmbH
• Architecture highlights: high throughput, low-latency, operability with stateful fault tolerance, strong processing guarantees, auto-scaling etc
• Application development model, unified approach for real-time and batch use cases
• Tools for ease of use, ease of operability and ease of management
• How customers use Apache Apex in production
Ein Prozess lernt laufen: LEGO Mindstorms Steuerung mit BPMNComsysto Reply GmbH
Betriebliche Abläufe gehören zum Geschäftsalltag wie LEGO zur Kindheit. Dennoch mangelt es in vielen Unternehmen an der Transparenz und Wiederverwendbarkeit dieser Abläufe. Mit der Business Process Model and Notation (BPMN) lässt sich dieses Manko beseitigen, und Fachabteilungen und Technik wachsen zusammen. Zusätzlich wird die Geschäftswelt ortsübergreifend visualisiert. Als Ergebnis entstehen Diagramme, die als Dokumentation für das Business dienen und Automatisierungsgrundlage der IT sind.
Im Gegensatz hierzu steht das praktische Lernen mit LEGO Mindstorms. Ein kleiner Roboter wird in mannigfaltigen Ausprägungen zusammengebaut, mit einer Java-API gefüttert und über externe Software gesteuert. Seriell und parallel geschaltete Einzelschritte erwecken unseren beweglichen Freund zum Leben.
In diesem Vortrag wird der Bogen zwischen Prozessen und Robotern gespannt. Bewegungsabläufe werden mit BPMN2 definiert und mittels einer Business Process Engine automatisiert. Es wird die Herangehensweise der Modellierung inklusive BPMN-Basics, Strukturierung von Prozessen sowie Motorik und Sensorik behandelt und während einer Live-Demo veranschaulicht.
Alles in allem ein Thema, das Programmierer und Tüftler, Profis und Unerfahrene sowie Erwachsene und Kinder gleichermaßen zu begeistern vermag.
https://youtu.be/bBJrKY_OBLc
Geospatial applications created using java script(and nosql)Comsysto Reply GmbH
Ever wondered how geospatial data works? Why don’t you come along and learn it where you’ll be presented to a fully functioning geospatial application that uses metadata from images to pinpoint them to a map. You’ll be introduced to a NoSQL tool and you’ll learn the basics of NoSQL technologies in a fun and initiative way. Along the way you’ll experience geospatial data, full stack application development using JavaScript and a little bit on semantic data as well. You will experience how easy it is to manage hybrid data (JSON documents, JPEG images as well as RDF triples) in one database, how to query geospatial data and how to work with JavaScript across a three tiered application.
From different ways of working (or the same) to working in a Scrum team. What is so different? Is it better? Does it bring value to delivered software? What are the benefits found for the developer in the Scrum team, and what was missing in the previous experience? What are the challenges? What is the most important to be agile? What if the team is distributed? What about people? What was the biggest surprise for author?
This talk brings author’s experience in joining a Scrum team after several years of working in any other way (or maybe it was really the same way). Author brings his experience by challenging segments of software development through different ways of working.
The author brings his own view of different components of development, from technical and organizational to social.
In the technical part he analyzes version control and way of using it, technologies, CI/CD, while in organizational segment analyzes issue tracking, tasks progress tracking, meetings, etc.
The author also brings own experience regarding the social component, such as collaboration in the team and out of the team, people in the team, their mindset, collaboration with the customer, management’s impact to the team, level of trust, and Scrum process over all.
Caching is a frequently used and misused technique for speeding up performance, off-loading non-scalable or expensive infrastructure, scaling systems and coping with large processing peaks. In this talk Greg introduces you to caching and highlights the key caching theory points that you should consider in applying caching. Then we take a comprehensive look at the new JCache standard standardises Java usage of caching.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
9. Real-Time Analysis of a Superhero Fight Club
Fight
hitter: Int
hittee: Int
hitpoints: Int
Segment
id: Int
name: String
segment: String
Detail
name: String
gender: Int
birthYear: Int
noOfAppearances: Int
Fight
hitter: Int
hittee: Int
hitpoints: Int
Fight
hitter: Int
hittee: Int
hitpoints: Int
Fight
hitter: Int
hittee: Int
hitpoints: Int
Fight
hitter: Int
hittee: Int
hitpoints: Int
Fight
hitter: Int
hittee: Int
hitpoints: Int
Fight
hitter: Int
hittee: Int
hitpoints: Int
Fight
hitter: Int
hittee: Int
hitpoints: Int
Hero
id: Int
name: String
segment: String
gender: Int
birthYear: Int
noOfAppearances: Int
{Stream
{Batch
16. Read the csv file and transform it
JavaRDD<String> segmentFile = sparkContext.textFile("s3://...");
JavaPairRDD<Integer, SegmentTableRecord> segmentTable = segmentFile
.map(line -> line.split(","))
.filter(array -> array.length == 3)
.mapToPair((String[] parts) -> {
int id = Integer.parseInt(parts[0]);
String name = parts[1], segment = parts[2];
return new Tuple2<>(name, new SegmentTableRecord(id, name, segment));
});
Join with detail data, filter out humans and write output
segmentTable.join(detailTable)
.mapValues(tuple -> {
SegmentTableRecord s = tuple._1();
DetailTableRecord d = tuple._2();
return new Hero(s.getId(), s.getName(), s.getSegment(),
d.getGender(), d.getBirthYear(), d.getNoOfAppearances());
})
.map(tuple -> tuple._2())
.filter(hero -> hero.getSegment().equals(HUMAN_SEGMENT))
.saveAsTextFile("s3://...");
17. Loading Files from S3 into POJO
DataSource<SegmentTableRecord> segmentTable = env.readCsvFile("s3://...")
.ignoreInvalidLines()
.pojoType(SegmentTableRecord.class, "id", "name", "segment");
Join and Filter
DataSet<Hero> humanHeros = segmentTable.join(detailTable)
.where("name")
.equalTo("name")
.with((s, d) -> new Hero(s.id, s.name, s.segment,
d.gender, d.birthYear, d.noOfAppearances))
.filter(hero -> hero.segment.equals("Human"));
Write back to S3
humanHeros.writeAsFormattedText(outputTablePath, WriteMode.OVERWRITE,
h -> h.toCsv());
18. Performance
Terasort1: Flink ca 66% of runtime
Terasort2: Flink ca. 68% of runtime
HashJoin: Flink ca. 32% of runtime
(Iterative Processes: Flink ca. 50% of runtime, ca. 7% with
Delta-Iterations)
19. 2nd Round Points
Generally similar abstraction and feature set
Flink has a nicer syntax, more sugar
Spark is pretty bare-metal
Flink is faster
23. Create Context and get Avro Stream from Kafka
JavaStreamingContext jssc = new JavaStreamingContext(conf,
Durations.seconds(1));
HashSet<String> topicsSet = Sets.newHashSet("FightEventTopic");
HashMap<String, String> kafkaParams = new HashMap<String, String>();
kafkaParams.put("metadata.broker.list", "xxx:11211");
kafkaParams.put("group.id", "spark");
JavaPairInputDStream<String, FightEvent> kafkaStream =
KafkaUtils.createDirectStream(jssc, String.class, FightEvent.class,
StringDecoder.class, AvroDecoder.class, kafkaParams, topicsSet);
Analyze number of hit points over a sliding window
kafkaStream.map(tuple -> tuple._2().getHitPoints())
.reduceByWindow((hit1, hit2) -> hit1 + hit2,
Durations.seconds(60), Durations.seconds(10))
.foreachRDD((rdd, time) -> {
rdd.saveAsTextFile(outputPath + "/round1-" + time.milliseconds());
LOGGER.info("Hitpoints in the last minute {}", rdd.take(5));
return null;
});
24. Output
20:19:32 Hitpoints in the last minute [80802]
20:19:42 Hitpoints in the last minute [101019]
20:19:52 Hitpoints in the last minute [141012]
20:20:02 Hitpoints in the last minute [184759]
20:20:12 Hitpoints in the last minute [215802]
25. 3rd Round Points
Flink supports event time windows
Kafka and Avro worked seamlessly in both
Spark uses micro-batches, no real stream
Both have at-least-once delivery guarantees
Exactly-once depends a lot on sink/source
41. Development
Compared to Hadoop, both are awesome
Both provide unified programming model for diverse scenarios
Comfort level of abstraction varies with use-case
Spark's Java API is cumbersome compared to the Scala API
Working with both is fun
Docs are ok, but spotty
47. Use Spark, if
You have Cloudera, Hortonworks. etc support and depend on it
You want to heavily use Graph and ML libraries
You want to use the more mature project
48. Use Flink, if
Real-Time processing is important for your use case
You want more complex window operations
You develop in Java only
If you want to support a German project