Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Evan Chan
This was a talk that Kelvin Chu and I just gave at the SF Bay Area Spark Meetup 5/14 at Palantir Technologies.
We discussed the Spark Job Server (http://github.com/ooyala/spark-jobserver), its history, example workflows, architecture, and exciting future plans to provide HA spark job contexts.
We also discussed the use case of the job server at Ooyala to facilitate fast query jobs using shared RDD and a shared job context, and how we integrate with Apache Cassandra.
A presentation on our experience at Ingram Content Group with Grafana and MySQL. In an enterprise environment it is sometimes necessary to keep data in a traditional, general purpose SQL database such as MySQL or PostgreSQL. These slides explore the challenges and benefits of using Grafana with an SQL database in a large enterprise production setting.
The paper explains how you can write an interpreter and get an optimizing just-in-time (JIT) compiler for free. This enables language designers to focus on features without worrying about the complexities of compiler optimizations and code generation. This paper presents a Java Virtual Machine (JVM) that allows the application to control the JIT compiler behavior at runtime. We'll discuss how various programming languages can take advantage of this framework.
To intrigue compiler aficionados, the authors show how combining AST node rewriting during interpretation, optimization, and deoptimization produces high performance code from the interpreter without a language-specific compiler. In addition, they present how features of a variety of programming languages, such as JavaScript, Ruby, Python, R and others, map on the framework.
This presentation deals with logging in the course of mobile development, namely describing the open source logging environment built with ELK stack (ElasticSearch, Logstash and Kibana).
Presentation by Igor Rudyk (Software Engineer, GlobalLogic, Lviv), delivered at Mobile TechTalk Lviv on April 28, 2015.
More details - http://globallogic.com.ua/mobile-techtalk-lviv-2015-report
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...gethue
Livy is a new open source Spark REST Server for submitting and interacting with your Spark jobs from anywhere. Livy is conceptually based on the incredibly popular IPython/Jupyter, but implemented to better integrate into the Hadoop ecosystem with multi users. Spark can now be offered as a service to anyone in a simple way: Spark shells in Python or Scala can be ran by Livy in the cluster while the end user is manipulating them at his own convenience through a REST api. Regular non-interactive applications can also be submitted. The output of the jobs can be introspected and returned in a tabular format, which makes it visualizable in charts. Livy can point to a unique Spark cluster and create several contexts by users. With YARN impersonation, jobs will be executed with the actual permissions of the users submitting them. Livy also enables the development of Spark Notebook applications. Those are ideal for quickly doing interactive Spark visualizations and collaboration from a Web browser! This talk is technical and details the architecture and design decisions taken for developing this server, as well as its internals. It also describes the alternatives we tried and the challenges that were faced. The capabilities of Livy will then be lived demo in Hue’s Notebook Application through a real life scenario.
https://spark-summit.org/eu-2015/events/building-a-rest-job-server-for-interactive-spark-as-a-service/
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)Cohesive Networks
Slides from the Chicago AWS user group on May 5th, 2016. Asaf Yigal, Co-Founder and VP Product at Logz.io, presented on using Elasticsearch, Logstash, and Kibana in Amazon Web Services.
"Setting up the increasingly-popular open-source ELK Stack (Elasticsearch, Logstash, and Kibana) on AWS might seem like an easy task, but we have gone through several iterations in our architecture and have made some mistakes in our deployments that have turned out to be common in the industry. In this talk, we will go through what we did and explain what worked and what failed -- and why. We will also provide a complete blueprint of how to set up ELK for production on AWS." ~ @asafyigal
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Evan Chan
This was a talk that Kelvin Chu and I just gave at the SF Bay Area Spark Meetup 5/14 at Palantir Technologies.
We discussed the Spark Job Server (http://github.com/ooyala/spark-jobserver), its history, example workflows, architecture, and exciting future plans to provide HA spark job contexts.
We also discussed the use case of the job server at Ooyala to facilitate fast query jobs using shared RDD and a shared job context, and how we integrate with Apache Cassandra.
A presentation on our experience at Ingram Content Group with Grafana and MySQL. In an enterprise environment it is sometimes necessary to keep data in a traditional, general purpose SQL database such as MySQL or PostgreSQL. These slides explore the challenges and benefits of using Grafana with an SQL database in a large enterprise production setting.
The paper explains how you can write an interpreter and get an optimizing just-in-time (JIT) compiler for free. This enables language designers to focus on features without worrying about the complexities of compiler optimizations and code generation. This paper presents a Java Virtual Machine (JVM) that allows the application to control the JIT compiler behavior at runtime. We'll discuss how various programming languages can take advantage of this framework.
To intrigue compiler aficionados, the authors show how combining AST node rewriting during interpretation, optimization, and deoptimization produces high performance code from the interpreter without a language-specific compiler. In addition, they present how features of a variety of programming languages, such as JavaScript, Ruby, Python, R and others, map on the framework.
This presentation deals with logging in the course of mobile development, namely describing the open source logging environment built with ELK stack (ElasticSearch, Logstash and Kibana).
Presentation by Igor Rudyk (Software Engineer, GlobalLogic, Lviv), delivered at Mobile TechTalk Lviv on April 28, 2015.
More details - http://globallogic.com.ua/mobile-techtalk-lviv-2015-report
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...gethue
Livy is a new open source Spark REST Server for submitting and interacting with your Spark jobs from anywhere. Livy is conceptually based on the incredibly popular IPython/Jupyter, but implemented to better integrate into the Hadoop ecosystem with multi users. Spark can now be offered as a service to anyone in a simple way: Spark shells in Python or Scala can be ran by Livy in the cluster while the end user is manipulating them at his own convenience through a REST api. Regular non-interactive applications can also be submitted. The output of the jobs can be introspected and returned in a tabular format, which makes it visualizable in charts. Livy can point to a unique Spark cluster and create several contexts by users. With YARN impersonation, jobs will be executed with the actual permissions of the users submitting them. Livy also enables the development of Spark Notebook applications. Those are ideal for quickly doing interactive Spark visualizations and collaboration from a Web browser! This talk is technical and details the architecture and design decisions taken for developing this server, as well as its internals. It also describes the alternatives we tried and the challenges that were faced. The capabilities of Livy will then be lived demo in Hue’s Notebook Application through a real life scenario.
https://spark-summit.org/eu-2015/events/building-a-rest-job-server-for-interactive-spark-as-a-service/
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)Cohesive Networks
Slides from the Chicago AWS user group on May 5th, 2016. Asaf Yigal, Co-Founder and VP Product at Logz.io, presented on using Elasticsearch, Logstash, and Kibana in Amazon Web Services.
"Setting up the increasingly-popular open-source ELK Stack (Elasticsearch, Logstash, and Kibana) on AWS might seem like an easy task, but we have gone through several iterations in our architecture and have made some mistakes in our deployments that have turned out to be common in the industry. In this talk, we will go through what we did and explain what worked and what failed -- and why. We will also provide a complete blueprint of how to set up ELK for production on AWS." ~ @asafyigal
In Hadoop in Taiwan 2013 event, engineer of TCloud Computing presented the security concepts and features of Hadoop, how to script Crypto API, configuration details and future development.
There are plenty of different out-of-the-box solutions ready to use as an edge service. Tyk.io, Zuul or Spring Cloud Gateway to name a few. Yet at Allegro we decided to build our own. Reinventing the wheel or filling the functionality gap? During the talk we want to share what are the qualities of different api gateways available out there and why sometimes it is still not enough (or too much).
An introduction to the API for OnTime for IBMontimesuite
Presentation from the OnTime for IBM API workshop in Shinjuku, Tokyo, Japan on Thursday 19 November 2015. Please contact OnTime support either in Denmark or Japan for more information.
People using your web app also use many other online services. You'll often want to pull data from those other services into your app, or publish data from your app out to other services. In this talk, Randy will explain the terminology you need to know, share best practices and techniques for integrating, and walk through two real-world examples. You'll leave with code snippets to help you get started integrating.
The web has changed! Users spend more time on mobile than on desktops and they expect to have an amazing user experience on both platforms. APIs are the heart of the new web as the central point of access data, encapsulating logic and providing the same data and same features for desktops and mobiles.
In this talk, I will show you how in only 45 minutes we can create full REST API, with documentation and admin application build with React.
CouchDB presentation with some technical details, made for a technical audience, shows use cases, comparison to other nosql databases and why it's useful for publishers
Example-driven Web API Specification DiscoveryJavier Canovas
Slides of my presentation at European Conference on Modelling Foundations and Applications (ECMFA'17). To be presented during the session on Thursday 16:00-17:30
To build up any non-trivial business processing, you may have to connect systems that are exposed by web-services, fire off events over message queues, notify users via email or social networking, and much more.
Apache Camel is a lightweight integration framework that helps you connect systems in a consistent and reliable way. Focus on the business reasons behind what's being integrated, not the underlying details of how.
Technical overview of three of the most representative KeyValue Stores: Cassandra, Redis and CouchDB. Focused on Ruby and Ruby on Rails developement, this talk shows how to solve common problems, the most popular libraries, benchmarking and the best use case for each one of them.
This talk was part of the Conferencia Rails 2009, Madrid, Spain.
http://app.conferenciarails.org/talks/43-key-value-stores-conviertete-en-un-jedi-master
An application programming interface (API) is a way for two different pieces of software to communicate with each other. In your WordPress plugins and themes, you’ll often want to pull data from or send data to a third-party service that has an API. In this talk, Randy will explain the terminology you need to know to get started, share best practices and techniques for integrating with APIs, and walk through two real-world examples. You’ll leave with code snippets to help you get started integrating.
REST API Security: OAuth 2.0, JWTs, and More!Stormpath
Les Hazlewood, Stormpath CTO, already showed you how to build a Beautiful REST+JSON API, but how do you secure your API? At Stormpath, we spent 18 months researching best practices. Join Les as he explains how to secure your REST API, the right way. We'll also host a live Q&A session at the end.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).