This document describes an interactive batch query system for game analytics based on Apache Drill. It addresses the problem of answering common ad-hoc queries over large volumes of log data by using a columnar data model and optimizing query plans. The system utilizes Drill's schema-free data model and vectorized query processing. It further improves performance by merging similar queries, reusing intermediate results, and pushing execution downwards to utilize multi-core CPUs. This provides a unified solution for both ad-hoc and scheduled batch analytics workloads at large scale.
This document discusses RabbitMQ, an open-source message broker. It provides instructions for installing RabbitMQ on Debian/Ubuntu systems and accessing the RabbitMQ management interface. It also lists some common RabbitMQ concepts like virtual hosts, exchanges, queues, and bindings and provides links to RabbitMQ tutorials and examples in different programming languages.
Embracing Clojure: a journey into Clojure adoptionLuca Grulla
What happens when a small team of very experienced developers with no real functional programming experience decides to use Clojure to run a core system architecture component?
This is the story of a 2 years journey of my team with Clojure, sharing learnings, epiphanies, success as well as some of the challenges we encountered.
The document discusses functional programming concepts in Clojure including immutable data structures like lists, vectors, and maps. It compares the functional style of Clojure to the object-oriented style of Java, showing how Clojure allows data to flow through transformations without side effects. Key points covered include Clojure's homoiconic nature, pure functions, and use of transformations and composition over iterative steps.
El documento describe los principales componentes de la Planificación y Control de Procesos (PCP), incluyendo el desarrollo de un Plan de Negocios Estratégico, Plan de Producción Táctica, Programa Maestro de Producción, Planificación de las Necesidades de Materiales, Gestión de Capacidades y Control de ejecución en línea para ayudar a las organizaciones a mejorar el desempeño y la eficiencia.
How to write a Neutron Plugin - if you really need tosalv_orlando
Slides for the talk from Salvatore Orlando and Armando Migliaccio at the Openstack Summit - Fall 2013 in Hong Kong
Talk abstract: http://openstacksummitnovember2013.sched.org/event/c6478ecf54d639de3b8b9958bfe9d450#.UnLEI5ROpU0
Use Neutron instead of nova-network
●
neutron_url = http://neutron:9696
●
neutron_auth_strategy = keystone
●
neutron_admin_auth_url = http://keystone:35357/v2.0
●
neutron_admin_username = neutron
●
neutron_admin_tenant_name = service
●
neutron_admin_password = password
Nova interaction with Neutron
1. Create network, subnet, router etc via Neutron API
2. Boot VM, pass network info to Neutron
3. Attach ports, floating IP via Neutron
4. On delete,
This document discusses RabbitMQ, an open-source message broker. It provides instructions for installing RabbitMQ on Debian/Ubuntu systems and accessing the RabbitMQ management interface. It also lists some common RabbitMQ concepts like virtual hosts, exchanges, queues, and bindings and provides links to RabbitMQ tutorials and examples in different programming languages.
Embracing Clojure: a journey into Clojure adoptionLuca Grulla
What happens when a small team of very experienced developers with no real functional programming experience decides to use Clojure to run a core system architecture component?
This is the story of a 2 years journey of my team with Clojure, sharing learnings, epiphanies, success as well as some of the challenges we encountered.
The document discusses functional programming concepts in Clojure including immutable data structures like lists, vectors, and maps. It compares the functional style of Clojure to the object-oriented style of Java, showing how Clojure allows data to flow through transformations without side effects. Key points covered include Clojure's homoiconic nature, pure functions, and use of transformations and composition over iterative steps.
El documento describe los principales componentes de la Planificación y Control de Procesos (PCP), incluyendo el desarrollo de un Plan de Negocios Estratégico, Plan de Producción Táctica, Programa Maestro de Producción, Planificación de las Necesidades de Materiales, Gestión de Capacidades y Control de ejecución en línea para ayudar a las organizaciones a mejorar el desempeño y la eficiencia.
How to write a Neutron Plugin - if you really need tosalv_orlando
Slides for the talk from Salvatore Orlando and Armando Migliaccio at the Openstack Summit - Fall 2013 in Hong Kong
Talk abstract: http://openstacksummitnovember2013.sched.org/event/c6478ecf54d639de3b8b9958bfe9d450#.UnLEI5ROpU0
Use Neutron instead of nova-network
●
neutron_url = http://neutron:9696
●
neutron_auth_strategy = keystone
●
neutron_admin_auth_url = http://keystone:35357/v2.0
●
neutron_admin_username = neutron
●
neutron_admin_tenant_name = service
●
neutron_admin_password = password
Nova interaction with Neutron
1. Create network, subnet, router etc via Neutron API
2. Boot VM, pass network info to Neutron
3. Attach ports, floating IP via Neutron
4. On delete,
Clojure: Towards The Essence Of Programming (What's Next? Conference, May 2011)Howard Lewis Ship
The document discusses Clojure and its approach to programming. It begins by defining the essence of programming as the intrinsic nature that determines a language's character. It then discusses how Clojure focuses on the essential aspects of programming by removing ceremony and focusing on data rather than objects or classes. The document uses examples to illustrate how Clojure allows for concise yet expressive code through its emphasis on data, immutability, and functional programming.
Ring provides a common abstraction for building web applications in Clojure. It defines handlers as functions that take HTTP requests as maps and return responses as maps. Adapters run handlers on web servers, and middleware can augment handlers. This allows writing web apps in an idiomatic way and sharing code across frameworks that target the Ring spec.
This document provides a summary of an introduction to the Clojure programming language. It discusses what Clojure is, its timeline and adoption, functional programming concepts, concurrency features using Software Transactional Memory, the Lisp ideology it is based on including homoiconicity and its macro system. It also provides an overview of getting started with Clojure including using the REPL, basic syntax like symbols and keywords, data types, sequences, functions, and Java interoperability. Resources for learning more about Clojure are also listed.
Using Clojure, NoSQL Databases and Functional-Style JavaScript to Write Gext-...Stefan Richter
The document is a presentation about building next-generation HTML5 apps using Clojure, NoSQL databases, and functional JavaScript. It discusses how the presenter built an HTML5 client for their software company's app using these technologies that runs on all modern browsers and mobile platforms like iPhone and Android, avoiding the need to build separate native apps. The presentation focuses on their use of functional programming principles in JavaScript to structure the client code.
The document discusses how abstraction is central to programming and how Clojure is a good language for creating abstractions, noting that Clojure provides primitive expressions, means of combination through functions, and means of abstraction through functions, records, multimethods and protocols to build complex programs from simple ideas.
DAMA Webinar - Big and Little Data QualityDATAVERSITY
While technological innovation brings constant change to the data landscape, many organizations still struggle with the basics: ensuring they have reliable, high quality data. In health care, the promise of insight to be gained through analytics is dependent on ensuring the interactions between providers and patients are recorded accurately and completely. While traditional health care data is dependent on person-to-person contact, new technologies are emerging that change how health care is delivered and how health care data is captured, stored, accessed and used. Using health care as a lens through which to understand the emergence of big data, this presentation will ask the audience to think about data in old and new ways in order to gain insight about how to improve the quality of data, regardless of size.
This document discusses visualizing data with code and provides information on tools and techniques for data visualization. It lists relevant fields like information design, data science, and cartography. It also lists example visualization tools and techniques like D3, Processing, network graphs, and mapping. Finally, it outlines a process for developing data visualizations that involves looking at the data, creating initial visualizations, asking questions, getting inspiration, refining ideas, and publishing visualizations.
When working with big data or complex algorithms, we often look to parallelize our code to optimize runtime. By taking advantage of a GPUs 1000+ cores, a data scientist can quickly scale out solutions inexpensively and sometime more quickly than using traditional CPU cluster computing. In this webinar, we will present ways to incorporate GPU computing to complete computationally intensive tasks in both Python and R.
See the full presentation here: 👉 https://vimeo.com/153290051
Learn more about the Domino data science platform: https://www.dominodatalab.com
An immersive workshop at General Assembly, SF. I typically teach this workshop at General Assembly, San Francisco. To see a list of my upcoming classes, visit https://generalassemb.ly/instructors/seth-familian/4813
I also teach this workshop as a private lunch-and-learn or half-day immersive session for corporate clients. To learn more about pricing and availability, please contact me at http://familian1.com
Hello Everyone !
"Salesforce Apex Hours" is a recurring event to talk about salesforce ! Some times we'd like to meet on one location and some time online. This time we are planning one online session on "Big Object" with Jigar Shah.
Agenda :-
1. Need for Big Objects
2. Consideration for Big Objects Usage
3. Demo
6. Limitations with using Big Objects
7. Q&A
8. Additional References
Speaker : -Jigar Shah , Amit Chaudhary
Date :- Saturday, JAN 27, 2018 10:00 AM EST
Link:- https://www.meetup.com/Farmington-Hills-Salesforce-Developer-Meetup/events/246658024/
Thanks
Amit Chaudhary @amit_sfdc
Email :- amit.salesforce21@gmail.com
To view recording of this webinar please use below URL:
http://wso2.com/library/webinars/2016/06/analytics-in-your-enterprise/
Big data spans many fields and brings together technologies like distributed systems, machine learning, statistics and Internet of Things (IoT). It has now become a multi-billion dollar industry with use cases ranging from targeted advertising and fraud detection to product recommendations and market surveys.
Some use cases such as urban planning can be slower (done in batch mode), while others such as the stock market needs results in milliseconds (done is a streaming fashion). Different technologies are used for each case; MapReduce for batch analytics, complex event processing for real-time analytics and machine learning for predictive analytics. Furthermore, the type of analysis ranges from basic statistics to complicated prediction models.
This webinar will discuss the big data landscape including
Concepts, use cases and technologies
Capabilities and applications of the WSO2 analytics platform
WSO2 Data Analytics Server
WSO2 Complex Event Processor
WSO2 Machine Learner
Building the BI system and analytics capabilities at the company based on Rea...GameCamp
Building the BI system and analytics capabilities at the company. How we built it and grew its capabilities?
How we developed our analytics capabilities? What was easy, what was more difficult. Preparation, process step by step. Presentation from GameCamp webinar: http://www.gamecamp.io/events/webinar-growing-measurement-capabilities-of-gaming-and-apps-company/
The document describes Krist Wongsuphasawat's background and work in data visualization. It notes that he has a PhD in Computer Science from the University of Maryland, where he studied information visualization. He currently works as a data visualization scientist at Twitter, where he builds internal tools to analyze log data and monitor changes over time. Some of his projects include Scribe Radar, which allows users to search through and visualize client event data in order to find patterns and monitor effects of product changes. The document provides details on his approaches for dealing with large log datasets and visualizing user activity sequences.
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...Mia Yuan Cao
Learn how to extract real-time insight from Big Data. JReport and ScaleDB’s combined solution delivers business value by ingesting Big Data at stunning velocity (millions of rows/second), then provides powerful visualizations, filtering and data analysis that enable you to draw quick conclusions to make agile business decisions. JReport's seamless connection to ScaleDB enables technical or non-technical users to build and modify their own reports and dashboards to visualize these vast data stores. Join us to see how.
This document summarizes Jon Hyman's presentation on using MongoDB for analytics at the NY MongoDB User Group. It discusses Appboy's use of pre-aggregated analytics documents to track time series data like app opens over time with breakdowns by dimension. It also covers Appboy's technique for quickly estimating the size of user segments by sampling random subsets of documents and extrapolating the results.
[@IndeedEng] Large scale interactive analytics with Imhotepindeedeng
Link to video: https://www.youtube.com/watch?v=IZ-kC6ut1Lg
In a previous talk, we explained how we developed Imhotep, a distributed system for building decision trees for machine learning. We went on to describe how we build large scale interactive analytics tools using the same platform. This has kept our engineering and product organizations focused on key metrics by analyzing test results. It also gives our marketing organization timely and accurate insight into our data - allowing us to identify opportunities, spot trends, and learn about our job seekers. In this talk, Zak Cocos, who leads our Marketing Sciences team, and Product Manager Tom Bergman will discuss and provide examples of the valuable insights that can be gained by using Imhotep with almost any data set.
This document discusses Badoo's use of MicroStrategy for business intelligence and analytics. It describes how MicroStrategy helped Badoo overcome challenges with their previous BI tool by providing dimensional modeling, self-service reports, and weekly releases. It highlights how MicroStrategy enabled data discovery, analysis delivery, and reporting for over 90 users across various teams. The document also provides examples of query optimizations in MicroStrategy that improved performance. Finally, it discusses how MicroStrategy has enabled Badoo to empower users through visual insights, transaction services, command manager automation, and streamlined web deployments.
Brian Greig gave a presentation on visualizing data in realtime using WebSockets and D3. He discussed collecting and consuming data from various sources, performing data analytics and visualizations using the DADA loop, using WebSockets for bidirectional data transmission, manipulating the DOM with D3 for data visualization, and presented a case study on building a simulation.
On Open Day, we share our activities of the month with each other and the community. It's when we take a step back and see where we stand. Here's our Open Day for August 2018.
Clojure: Towards The Essence Of Programming (What's Next? Conference, May 2011)Howard Lewis Ship
The document discusses Clojure and its approach to programming. It begins by defining the essence of programming as the intrinsic nature that determines a language's character. It then discusses how Clojure focuses on the essential aspects of programming by removing ceremony and focusing on data rather than objects or classes. The document uses examples to illustrate how Clojure allows for concise yet expressive code through its emphasis on data, immutability, and functional programming.
Ring provides a common abstraction for building web applications in Clojure. It defines handlers as functions that take HTTP requests as maps and return responses as maps. Adapters run handlers on web servers, and middleware can augment handlers. This allows writing web apps in an idiomatic way and sharing code across frameworks that target the Ring spec.
This document provides a summary of an introduction to the Clojure programming language. It discusses what Clojure is, its timeline and adoption, functional programming concepts, concurrency features using Software Transactional Memory, the Lisp ideology it is based on including homoiconicity and its macro system. It also provides an overview of getting started with Clojure including using the REPL, basic syntax like symbols and keywords, data types, sequences, functions, and Java interoperability. Resources for learning more about Clojure are also listed.
Using Clojure, NoSQL Databases and Functional-Style JavaScript to Write Gext-...Stefan Richter
The document is a presentation about building next-generation HTML5 apps using Clojure, NoSQL databases, and functional JavaScript. It discusses how the presenter built an HTML5 client for their software company's app using these technologies that runs on all modern browsers and mobile platforms like iPhone and Android, avoiding the need to build separate native apps. The presentation focuses on their use of functional programming principles in JavaScript to structure the client code.
The document discusses how abstraction is central to programming and how Clojure is a good language for creating abstractions, noting that Clojure provides primitive expressions, means of combination through functions, and means of abstraction through functions, records, multimethods and protocols to build complex programs from simple ideas.
DAMA Webinar - Big and Little Data QualityDATAVERSITY
While technological innovation brings constant change to the data landscape, many organizations still struggle with the basics: ensuring they have reliable, high quality data. In health care, the promise of insight to be gained through analytics is dependent on ensuring the interactions between providers and patients are recorded accurately and completely. While traditional health care data is dependent on person-to-person contact, new technologies are emerging that change how health care is delivered and how health care data is captured, stored, accessed and used. Using health care as a lens through which to understand the emergence of big data, this presentation will ask the audience to think about data in old and new ways in order to gain insight about how to improve the quality of data, regardless of size.
This document discusses visualizing data with code and provides information on tools and techniques for data visualization. It lists relevant fields like information design, data science, and cartography. It also lists example visualization tools and techniques like D3, Processing, network graphs, and mapping. Finally, it outlines a process for developing data visualizations that involves looking at the data, creating initial visualizations, asking questions, getting inspiration, refining ideas, and publishing visualizations.
When working with big data or complex algorithms, we often look to parallelize our code to optimize runtime. By taking advantage of a GPUs 1000+ cores, a data scientist can quickly scale out solutions inexpensively and sometime more quickly than using traditional CPU cluster computing. In this webinar, we will present ways to incorporate GPU computing to complete computationally intensive tasks in both Python and R.
See the full presentation here: 👉 https://vimeo.com/153290051
Learn more about the Domino data science platform: https://www.dominodatalab.com
An immersive workshop at General Assembly, SF. I typically teach this workshop at General Assembly, San Francisco. To see a list of my upcoming classes, visit https://generalassemb.ly/instructors/seth-familian/4813
I also teach this workshop as a private lunch-and-learn or half-day immersive session for corporate clients. To learn more about pricing and availability, please contact me at http://familian1.com
Hello Everyone !
"Salesforce Apex Hours" is a recurring event to talk about salesforce ! Some times we'd like to meet on one location and some time online. This time we are planning one online session on "Big Object" with Jigar Shah.
Agenda :-
1. Need for Big Objects
2. Consideration for Big Objects Usage
3. Demo
6. Limitations with using Big Objects
7. Q&A
8. Additional References
Speaker : -Jigar Shah , Amit Chaudhary
Date :- Saturday, JAN 27, 2018 10:00 AM EST
Link:- https://www.meetup.com/Farmington-Hills-Salesforce-Developer-Meetup/events/246658024/
Thanks
Amit Chaudhary @amit_sfdc
Email :- amit.salesforce21@gmail.com
To view recording of this webinar please use below URL:
http://wso2.com/library/webinars/2016/06/analytics-in-your-enterprise/
Big data spans many fields and brings together technologies like distributed systems, machine learning, statistics and Internet of Things (IoT). It has now become a multi-billion dollar industry with use cases ranging from targeted advertising and fraud detection to product recommendations and market surveys.
Some use cases such as urban planning can be slower (done in batch mode), while others such as the stock market needs results in milliseconds (done is a streaming fashion). Different technologies are used for each case; MapReduce for batch analytics, complex event processing for real-time analytics and machine learning for predictive analytics. Furthermore, the type of analysis ranges from basic statistics to complicated prediction models.
This webinar will discuss the big data landscape including
Concepts, use cases and technologies
Capabilities and applications of the WSO2 analytics platform
WSO2 Data Analytics Server
WSO2 Complex Event Processor
WSO2 Machine Learner
Building the BI system and analytics capabilities at the company based on Rea...GameCamp
Building the BI system and analytics capabilities at the company. How we built it and grew its capabilities?
How we developed our analytics capabilities? What was easy, what was more difficult. Preparation, process step by step. Presentation from GameCamp webinar: http://www.gamecamp.io/events/webinar-growing-measurement-capabilities-of-gaming-and-apps-company/
The document describes Krist Wongsuphasawat's background and work in data visualization. It notes that he has a PhD in Computer Science from the University of Maryland, where he studied information visualization. He currently works as a data visualization scientist at Twitter, where he builds internal tools to analyze log data and monitor changes over time. Some of his projects include Scribe Radar, which allows users to search through and visualize client event data in order to find patterns and monitor effects of product changes. The document provides details on his approaches for dealing with large log datasets and visualizing user activity sequences.
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...Mia Yuan Cao
Learn how to extract real-time insight from Big Data. JReport and ScaleDB’s combined solution delivers business value by ingesting Big Data at stunning velocity (millions of rows/second), then provides powerful visualizations, filtering and data analysis that enable you to draw quick conclusions to make agile business decisions. JReport's seamless connection to ScaleDB enables technical or non-technical users to build and modify their own reports and dashboards to visualize these vast data stores. Join us to see how.
This document summarizes Jon Hyman's presentation on using MongoDB for analytics at the NY MongoDB User Group. It discusses Appboy's use of pre-aggregated analytics documents to track time series data like app opens over time with breakdowns by dimension. It also covers Appboy's technique for quickly estimating the size of user segments by sampling random subsets of documents and extrapolating the results.
[@IndeedEng] Large scale interactive analytics with Imhotepindeedeng
Link to video: https://www.youtube.com/watch?v=IZ-kC6ut1Lg
In a previous talk, we explained how we developed Imhotep, a distributed system for building decision trees for machine learning. We went on to describe how we build large scale interactive analytics tools using the same platform. This has kept our engineering and product organizations focused on key metrics by analyzing test results. It also gives our marketing organization timely and accurate insight into our data - allowing us to identify opportunities, spot trends, and learn about our job seekers. In this talk, Zak Cocos, who leads our Marketing Sciences team, and Product Manager Tom Bergman will discuss and provide examples of the valuable insights that can be gained by using Imhotep with almost any data set.
This document discusses Badoo's use of MicroStrategy for business intelligence and analytics. It describes how MicroStrategy helped Badoo overcome challenges with their previous BI tool by providing dimensional modeling, self-service reports, and weekly releases. It highlights how MicroStrategy enabled data discovery, analysis delivery, and reporting for over 90 users across various teams. The document also provides examples of query optimizations in MicroStrategy that improved performance. Finally, it discusses how MicroStrategy has enabled Badoo to empower users through visual insights, transaction services, command manager automation, and streamlined web deployments.
Brian Greig gave a presentation on visualizing data in realtime using WebSockets and D3. He discussed collecting and consuming data from various sources, performing data analytics and visualizations using the DADA loop, using WebSockets for bidirectional data transmission, manipulating the DOM with D3 for data visualization, and presented a case study on building a simulation.
On Open Day, we share our activities of the month with each other and the community. It's when we take a step back and see where we stand. Here's our Open Day for August 2018.
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
This document discusses Elasticsearch and how it can be used to search, analyze, and make sense of large amounts of data. It provides examples of how Elasticsearch is being used by large companies to handle petabytes of data and gain insights. Implementations in France are highlighted. The document concludes by demonstrating how easily Elasticsearch can be deployed and used to ingest and search sample data.
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Krist Wongsuphasawat
Slides from my talk at the IEEE Conference on Visual Analytics Science and Technology (VAST) 2014 in Paris, France.
ABSTRACT
Logging user activities is essential to data analysis for internet products and services.
Twitter has built a unified logging infrastructure that captures user activities across all clients it owns, making it one of the largest datasets in the organization.
This paper describes challenges and opportunities in applying information visualization to log analysis at this massive scale, and shows how various visualization techniques can be adapted to help data scientists extract insights.
In particular, we focus on two scenarios:\ (1) monitoring and exploring a large collection of log events, and (2) performing visual funnel analysis on log data with tens of thousands of event types.
Two interactive visualizations were developed for these purposes:
we discuss design choices and the implementation of these systems, along with case studies of how they are being used in day-to-day operations at Twitter.
Machine learning with Spark : the road to productionAndrea Baita
1) The document discusses best practices for implementing machine learning models in production using Apache Spark, including prototyping models in notebooks but implementing them properly in Spark, testing models using BDD, and monitoring models and business metrics once deployed.
2) It provides an example case study of implementing a machine learning model to predict clicks for an advertising campaign, including the required architecture, model deployment approaches, and defining relevant metrics to monitor.
3) Releasing and deploying machine learning models to production requires tools for continuous delivery, monitoring failures and business metrics over time to ensure model quality and adapt the model based on new data.
Elasticsearch : petit déjeuner du 13 mars 2014ALTER WAY
Elasticsearch est un moteur de recherche Open Source très puissant basé sur
Apache Lucene. Il permet l'indexation de millions de données, leur recherche et leur
analyse en temps réel. Les outils Elascticsearch sont déjà utilisés par des acteurs de
référence tels que FourSquare, GitHub, OpenDataSoft ou encore Dailymotion.
Alter Way et Elasticsearch vous convient à venir découvrir la suite Elasticsearch
enfin disponible en version 1.0 et prête pour la production !
This document summarizes a presentation given in September 2013 by Archana Joshi, a senior manager at Cognizant, and Zaheer Abbas Contractor, head of AgileNext at Wipro Technologies. The presentation covered Agile basics such as the primary goal of Agile development being working software, critical items to start a Scrum project, and the correct sequence of events in the Scrum framework. It also discussed concepts like what a product backlog item, sprint burn-down charts, and the product owner's role. The document provided examples and explanations to build understanding of foundational Agile and Scrum terminology and practices.
2-1 Remember the Help Desk with AFCU - Jared Flanders, FinalJared Flanders
Jared Flanders, a Systems Monitoring Engineer at America First Credit Union, presented on the credit union's ITSM journey and their experience implementing the HPE Service Anywhere platform. Some key points:
- America First Credit Union previously used HP Service Desk and Service Manager but wanted to avoid constant SM upgrades. They implemented Service Anywhere in 2015 after a proof of concept showed how it could meet their needs.
- Implementation took around 8 weeks and initially focused on help desk, IT support, and integrations with UCMDB and Connect-It. Additional groups like DBAs and computer operations were onboarded later.
- In the past year, they have added over 200 knowledge articles, automated a
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAmazon Web Services
Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze your data with existing BI tools for a fraction of the cost of traditional data warehouses.
This webinar will familiarize you with reporting, visualization, and business intelligence options for your Amazon Redshift data warehouse. You will learn how to effectively use exisiting BI tools and SQL clients with your Amazon Redshift data warehouse as well as techniques for performing advanced analytics.
Learning Objectives:
Options for processing, analyzing, and visualizing data in Amazon Redshift
Extending the Amazon Redshift SQL query capabilities
Optimizing query performance with Redshift ODBC / JDBC driver
Overview of BI solutions from our partners
Before vs After: Redesigning a Website to be Useful and Informative for Devel...Teresa Giacomini
There are so many fun challenges in creating a useful website for a developer audience today: you’ve got to empathize with your audience, nail the voice, understand the “jobs” your site’s visitors are trying to accomplish, make sure you anticipate (and answer!) the questions people are likely to have. In this quick lightning talk, I’ll share some before vs. after pics of a recent Citus Data site redesign—and will share some of the best practices we used, based on my years as a developer, software engineering manager, product manager, and now, as a marketer.
Similar to 穆黎森:Interactive batch query at scale (20)
詹剑锋:Big databench—benchmarking big data systemshdhappy001
This document discusses BigDataBench, an open source project for big data benchmarking. BigDataBench includes six real-world data sets and 19 workloads that cover common big data applications and preserve the four V's of big data. The workloads were chosen to represent typical application domains like search engines, social networks, and e-commerce. BigDataBench aims to provide a standardized benchmark for evaluating big data systems, architectures, and software stacks. It has been used in several case studies for workload characterization and performance evaluation of different hardware platforms for big data workloads.
The document discusses big data visualization and visual analysis, focusing on the challenges and opportunities. It begins with an overview of visualization and then discusses several challenges in big data visualization, including integrating heterogeneous data from different sources and scales, dealing with data and task complexity, limited interaction capabilities for large data, scalability for both data and users, and the need for domain and development libraries/tools. It then provides examples of visualizing taxi GPS data and traffic patterns in Beijing to identify traffic jams.
Spark is an open source cluster computing framework originally developed at UC Berkeley. Intel has made many contributions to Spark's development through code commits, patches, and collaborating with the Spark community. Spark is widely used by companies like Alibaba, Baidu, and Youku for large-scale data analytics and machine learning tasks. It allows for faster iterative jobs than Hadoop through its in-memory computing model and supports multiple workloads including streaming, SQL, and graph processing.
刘诚忠:Running cloudera impala on postgre sqlhdhappy001
This document summarizes a presentation about running Cloudera Impala on PostgreSQL to enable SQL queries on large datasets. Key points:
- The company processes 3 billion daily ad impressions and 20TB of daily report data, requiring a scalable SQL solution.
- Impala was chosen for its fast performance from in-memory processing and code generation. The architecture runs Impala coordinators and executors across clusters.
- The author hacked Impala to also scan data from PostgreSQL for mixed workloads. This involved adding new scan node types and metadata.
- Tests on a 150 million row dataset showed Impala with PostgreSQL achieving 20 million rows scanned per second per core.
Introducing BoxLang : A new JVM language for productivity and modularity!Ortus Solutions, Corp
Just like life, our code must adapt to the ever changing world we live in. From one day coding for the web, to the next for our tablets or APIs or for running serverless applications. Multi-runtime development is the future of coding, the future is to be dynamic. Let us introduce you to BoxLang.
Dynamic. Modular. Productive.
BoxLang redefines development with its dynamic nature, empowering developers to craft expressive and functional code effortlessly. Its modular architecture prioritizes flexibility, allowing for seamless integration into existing ecosystems.
Interoperability at its Core
With 100% interoperability with Java, BoxLang seamlessly bridges the gap between traditional and modern development paradigms, unlocking new possibilities for innovation and collaboration.
Multi-Runtime
From the tiny 2m operating system binary to running on our pure Java web server, CommandBox, Jakarta EE, AWS Lambda, Microsoft Functions, Web Assembly, Android and more. BoxLang has been designed to enhance and adapt according to it's runnable runtime.
The Fusion of Modernity and Tradition
Experience the fusion of modern features inspired by CFML, Node, Ruby, Kotlin, Java, and Clojure, combined with the familiarity of Java bytecode compilation, making BoxLang a language of choice for forward-thinking developers.
Empowering Transition with Transpiler Support
Transitioning from CFML to BoxLang is seamless with our JIT transpiler, facilitating smooth migration and preserving existing code investments.
Unlocking Creativity with IDE Tools
Unleash your creativity with powerful IDE tools tailored for BoxLang, providing an intuitive development experience and streamlining your workflow. Join us as we embark on a journey to redefine JVM development. Welcome to the era of BoxLang.
QA or the Highway - Component Testing: Bridging the gap between frontend appl...zjhamm304
These are the slides for the presentation, "Component Testing: Bridging the gap between frontend applications" that was presented at QA or the Highway 2024 in Columbus, OH by Zachary Hamm.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
6. The Problem
•
How many logins today?
•
How many individual users this week?
•
Total income today?
•
Paid user amount this month?
•
…
!6
7. The Problem: Facts
•
How many X during time period of Y
!
•
event
amount
login
-
1383729081
user_002
login
-
1383729082
user_001
!
user id
user_001
!
pay
4.99
1383729084
user_003
login
-
1383729090
Fact Table
!7
timestamp
8. The Problem: Facts
•
How many logins today?
•
How many individual users this week?
•
Total income today?
•
Paid user amount this month?
•
…
!8
9. The Problem: Facts
•
How many logins today?
!
•
event
amount
login
-
1383729081
user_002
login
-
1383729082
user_001
!
user id
user_001
!
pay
4.99
1383729084
user_003
login
-
1383729090
timestamp
select count(*) from fact where event=‘login’ and
date(timestamp)=‘2013-12-06’;
!9
10. The Problem: Facts
•
How many individual users this week?
!
•
event
amount
login
-
1383729081
user_002
login
-
1383729082
user_001
!
user id
user_001
!
timestamp
pay
4.99
1383729084
user_003
login
-
1383729090
select count(distinct uid) from fact where event=‘login’ and
timestamp>=‘?’ and timestamp<‘?’;
!10
11. The Problem: Facts
•
Total income today?
!
•
event
amount
login
-
1383729081
user_002
login
-
1383729082
user_001
!
user id
user_001
!
timestamp
pay
4.99
1383729084
user_003
login
-
1383729090
select sum(amount) from fact where event=‘pay’ and timestamp
>=‘?’ and timestamp<‘?’;
!11
12. The Problem: Facts
•
Paid user amount this month?
!
•
event
amount
login
-
1383729081
user_002
login
-
1383729082
user_001
!
user id
user_001
!
timestamp
pay
4.99
1383729084
user_003
login
-
1383729090
select count(distinct uid) from fact where event=‘pay’ and
timestamp >=‘?’ and timestamp<‘?’;
!12
13. The Problem: Dimensions
•
How many logins today from China?
•
How many individual users of each server this
week?
•
Total income today by new user?
•
Paid user amount this month from Adwords?
•
…
!13
14. The Problem: Dimensions
•
The user X’s property Y is of value Z
!
•
refer
en
adwords
user_002 20110927
cn
facebook
user_003 20121010
!
language
user_001 20100612
!
fr
admob
user_004 20130522
it
tapjoy
user id
reg_time
Dimension Table
!14
…
15. Fact & Dimension
•
Aggregation on Join
user id
user_001
user_002
user_001
user_003
user id
user_001
user_002
user_003
user_004
event
login
login
pay
login
amount
4.99
-
timestamp
1383729081
1383729082
1383729084
1383729090
reg_time language refer
20100612
en
adwords
20110927
cn
facebook
20121010
fr
admob
20130522
it
tapjoy
!15
…
16. Fact & Dimension
•
How many logins today from China?
•
How many individual users of each server this
week?
•
Total income today by new user?
•
Paid user amount this month from adwords?
•
…
!16
17. Fact & Dimension
SELECT COUNT DISTINCT (on uid)
JOIN (1 fact, n dimension, on uid)
WHERE (filter by value of dimensions/facts)
GROUP BY (value of dimension)
!17
18. Fact & Dimension
•
SQL
agg
•
-> Syntax tree
Join
•
•
-> Logical Plan
-> Physical Plan
Join
filter
filter
filter
scan:
Dimension
scan:
Dimension
scan:
Fact
32. about Space Efficiency
•
Compact data representation
•
•
Java object overhead: high
JVM friendly(GC)
•
Simpler object graph
•
Less tenured space, less full GC
!32
33. about Time Efficiency
•
Cache friendly
•
•
Superscalar: pipeline friendly
•
•
the inner loop problem
SIMD friendly
•
•
data access Locality
opportunity to operate on a vector of values
JVM friendly(JNI)
!33
38. Review the Considerations
•
name:VarCh
Cache friendly
•
Superscalar: pipeline friendly
•
SIMD friendly
•
Compact data representation
•
JVM friendly(GC)
•
JVM friendly(JNI)
!38
price.coupon:boole
i price.basic:flo
c
4.99
e
…
c
r
e
a
m
…
T
…
43. Adhoc batch query
Fact
user id
event
time
user_13
login
2013-07-26
user_13
login
2013-07-26
user_76
pay
2013-07-27
Dimension
user id
nation
user_13
cn
user_76
en
DAU
2013-07-26 2013-07-27
en
576
491
cn
361
945
!43
58. Jobs vs Predictions
•
Offline job
•
becomes predictions of what data user may
be interested in
•
by merging more query together
•
daily predictions & hourly predictions
!58
61. Utilising Multi-core
•
Now:
agg
•
Push data from Leaf
Join
•
•
Data driven upwards
Pooled execution
filter
nation=‘en’
scan:
Dimension
!61
filter
date=‘2013-07-26’
scan:
Fact
62. Adhoc batch query
•
Benefits
•
Reduce the same Scans
•
Merge similar Scans
•
Merge intermediate operators
•
Unified process for adhoc & batch process
•
Multi-core process of single Plan
!62
64. About Xingcloud
•
Now
•
•
2 billion insert/update daily
•
200k+ aggregation data/day, 6k sec in total
•
•
http://a.xingcloud.com
query response time: <1sec - 100 sec, 10 sec on avg.
Future
•
Plan Merge
•
Unified process for batch, adhoc & stream process, SQL oriented
•
SQL(t): Plan with time window
!64
65. About Drill
•
Now
•
•
on Parquet/ORCFile on HDFS
•
•
Distributed Join
Write interface of storage engines
Future
•
1.0 M2: December 2013
•
1.0 GA: Early 2014
•
more detail on https://issues.apache.org/jira/browse/DRILL
!65