The data model for various types of networks (social, knowledge, control, etc) is well understood, but representing events (such as page views) in a graph is a challenge with no clear best solutions.
This document proposes using a key-key-value store model to efficiently process graph data in the cloud for social networks. It describes how social network data exhibits locality that can be leveraged. An on-line graph partitioning algorithm is presented that assigns related user data and profiles to the same machines to reduce the number of connections needed. The key-key-value store model extends traditional key-value stores to also store connections between users. Experimental results show the on-line partitioning algorithm performs comparably to static algorithms while adapting to dynamic changes.
IntraMaps - User Group - November 2010 - Hansen IntegrationDavid Hair
This is the presentation that I gave to the IntraMaps user group meeting at Stonnington Council.
The presentation goes through the steps Maroondah Council have gone through to link IntraMaps and Hansen to view data inside of Intramaps in addition to linking to and from Hansen and IntraMaps.
C:\fakepath\ssis ssas sssrs_pps_hong_bingli_v2003Hong-Bing Li
This document provides examples of the author's work with business intelligence tools including SQL Server Integration Services, SQL Server Analysis Services, SQL Server Reporting Services, Performance Point Server, SharePoint Server, and MDX programming. Specifically it discusses ETL processes in SSIS, cube development in SSAS, reports in SSRS including dashboards and cascading parameters, scorecards and linked reports in PPS, deploying reports to SharePoint, and sample MDX queries.
This document discusses loading social network data into R and performing social network analysis. It covers loading edge list data into igraph objects, visualizing networks using tkplot, calculating centrality measures like degree, betweenness, closeness and eigenvector centrality using functions from igraph, and identifying key actors by plotting eigenvector centrality against betweenness and examining residuals.
This document summarizes a business intelligence project for a construction company called AllWorks. The project involves integrating various external data sources like Excel spreadsheets, XML files, and CSV files into a SQL Server database using SQL Server Integration Services (SSIS). Dimensional models are created in SQL Server Analysis Services (SSAS) from the integrated data. SQL Server Reporting Services (SSRS) and Excel are used to build reports on the data. PerformancePoint Server (PPS) is used to create dashboards with KPIs, charts, and filters that provide insights into employee, customer, timesheet, and invoice data.
This document provides an overview of ArcGIS extensions that can expand the capabilities of ArcGIS software. It describes several extensions including ArcGIS Spatial Analyst, ArcGIS 3D Analyst, ArcGIS Geostatistical Analyst, ArcGIS Network Analyst, and ArcGIS Schematics. These extensions allow for advanced spatial analysis, 3D visualization and surface generation, statistical analysis of spatial data, network-based analysis including routing and service areas, and automation of schematic representations. The extensions can be used with ArcGIS Desktop as well as ArcGIS Server and ArcGIS Engine.
Identifying and developing flows modifiedcsk selva
The document discusses various approaches to identifying and mapping traffic flows in a network. Some common approaches include focusing on particular applications, devices or functions, developing profiles of common applications, and choosing the top N applications. The flow identification process involves determining which devices generate or terminate flows based on application and device usage and location information. Flows can then be identified, tagged with profiles, and their compositions and performance requirements mapped. This information can be used to design and visualize network flows and aggregation points between devices, applications, and locations.
The document provides steps for gathering crime log data from a public safety website, organizing it in Excel, geocoding the addresses using an online tool, and saving the resulting map either by storing it on the geocoding site or downloading as a KML file. It demonstrates gathering data from the web, preparing it in Excel, geocoding the locations, and saving the map for later use without writing custom code.
This document proposes using a key-key-value store model to efficiently process graph data in the cloud for social networks. It describes how social network data exhibits locality that can be leveraged. An on-line graph partitioning algorithm is presented that assigns related user data and profiles to the same machines to reduce the number of connections needed. The key-key-value store model extends traditional key-value stores to also store connections between users. Experimental results show the on-line partitioning algorithm performs comparably to static algorithms while adapting to dynamic changes.
IntraMaps - User Group - November 2010 - Hansen IntegrationDavid Hair
This is the presentation that I gave to the IntraMaps user group meeting at Stonnington Council.
The presentation goes through the steps Maroondah Council have gone through to link IntraMaps and Hansen to view data inside of Intramaps in addition to linking to and from Hansen and IntraMaps.
C:\fakepath\ssis ssas sssrs_pps_hong_bingli_v2003Hong-Bing Li
This document provides examples of the author's work with business intelligence tools including SQL Server Integration Services, SQL Server Analysis Services, SQL Server Reporting Services, Performance Point Server, SharePoint Server, and MDX programming. Specifically it discusses ETL processes in SSIS, cube development in SSAS, reports in SSRS including dashboards and cascading parameters, scorecards and linked reports in PPS, deploying reports to SharePoint, and sample MDX queries.
This document discusses loading social network data into R and performing social network analysis. It covers loading edge list data into igraph objects, visualizing networks using tkplot, calculating centrality measures like degree, betweenness, closeness and eigenvector centrality using functions from igraph, and identifying key actors by plotting eigenvector centrality against betweenness and examining residuals.
This document summarizes a business intelligence project for a construction company called AllWorks. The project involves integrating various external data sources like Excel spreadsheets, XML files, and CSV files into a SQL Server database using SQL Server Integration Services (SSIS). Dimensional models are created in SQL Server Analysis Services (SSAS) from the integrated data. SQL Server Reporting Services (SSRS) and Excel are used to build reports on the data. PerformancePoint Server (PPS) is used to create dashboards with KPIs, charts, and filters that provide insights into employee, customer, timesheet, and invoice data.
This document provides an overview of ArcGIS extensions that can expand the capabilities of ArcGIS software. It describes several extensions including ArcGIS Spatial Analyst, ArcGIS 3D Analyst, ArcGIS Geostatistical Analyst, ArcGIS Network Analyst, and ArcGIS Schematics. These extensions allow for advanced spatial analysis, 3D visualization and surface generation, statistical analysis of spatial data, network-based analysis including routing and service areas, and automation of schematic representations. The extensions can be used with ArcGIS Desktop as well as ArcGIS Server and ArcGIS Engine.
Identifying and developing flows modifiedcsk selva
The document discusses various approaches to identifying and mapping traffic flows in a network. Some common approaches include focusing on particular applications, devices or functions, developing profiles of common applications, and choosing the top N applications. The flow identification process involves determining which devices generate or terminate flows based on application and device usage and location information. Flows can then be identified, tagged with profiles, and their compositions and performance requirements mapped. This information can be used to design and visualize network flows and aggregation points between devices, applications, and locations.
The document provides steps for gathering crime log data from a public safety website, organizing it in Excel, geocoding the addresses using an online tool, and saving the resulting map either by storing it on the geocoding site or downloading as a KML file. It demonstrates gathering data from the web, preparing it in Excel, geocoding the locations, and saving the map for later use without writing custom code.
Bivariate maps show two themes on the same map. The graphic marks used to represent the themes may be different, as with proportional symbols on a choropleth map, or they may be the same. Bivariate choropleth and bivariate point symbol maps fall into the latter category. Although ArcGIS does not have any out of the box tools to make these same-symbol bivariate maps, in this presentation I introduce a new set of tools that can be used to ease the compilation of these maps. Combined with standard tools, it is now easier and faster to make these bivariate maps in ArcGIS.
Visualizing Data in a Web Browser with Cesium ion & FMESafe Software
Learn how to bring data into Cesium ion with FME. Get tips for pre-processing data and turning it into robust 3D objects for visualization in a web browser.
This webinar will cover:
1. What is Cesium ion? (Presented by Cesium)
2. How do FME and Cesium work together?
3. Demo: How to prepare your data for Cesium ion with FME.
Plus, preview new functionality coming in FME 2019.0!
This presentation shows the general methodology of porting a JEE web application to GigaSpaces XAP and the lessons learned from the process of porting the Spring PetClinic sample application to the GigaSpaces platform
Cascade is crisis management software that allows users to:
1. Prepare for crises by mapping risks, infrastructure, and response plans over time.
2. Conduct exercises and training simulations to test response plans.
3. Manage crises in real-time by monitoring events, impacts, and coordinating responses.
4. Review crises by evaluating responses, impacts, and providing feedback.
It is a web-based tool that can be accessed from any device and allows multiple simultaneous users to collaborate on crisis mapping.
This document contains the resume of Minhhue Khuu. It summarizes his education, including obtaining a B.S. in Electrical and Computer Engineering from the University of Washington, as well as an Associate's degree from North Seattle College. It also outlines his coursework, skills in programming languages and hardware, work experience as an IT analyst, projects including a home security system and airplane booking service, and additional details.
This document discusses different types of queries in ArcGIS including attribute queries, location queries, and data driven page queries. Attribute queries extract features from a layer based on attribute values while location queries select features based on their spatial relationship to other features. Data driven page queries allow features to be shown or hidden on a map based on their associated page number from an index layer. Examples and demonstrations of each query type are provided.
This document discusses various modeling techniques used in structured systems analysis, including:
1) Data flow diagrams and system flowcharts are used to model system functions and data flows.
2) Entity-relationship diagrams and data dictionaries are used to model stored data and define data elements.
3) The purpose of these modeling techniques is to provide precise, understandable definitions of the system to both users and developers.
CloudAnt is a database as a service that provides components for building DBaaS applications. Documents are stored as JSON and accessed via API. Documents can represent single entities with related data embedded. MapReduce is used to implement views and relations without large single documents. Replication manages consistency across leader and replica nodes. Attachments can be included inline or referenced in documents.
A Tableau course is an educational program or training that is designed to teach individuals how to use Tableau, a popular data visualization and business intelligence software.
For more information visit: https://datamites.com/tableau-course-training-bangalore/
This document describes DATA-SPREAD, a system that unifies databases and spreadsheets. It discusses the challenges in combining these two disparate data models. DATA-SPREAD retains the ease-of-use of spreadsheets while adding database capabilities like scalability and powerful SQL queries. It uses a backend database (Postgres) while maintaining a spreadsheet frontend. The demonstration will allow users to interactively analyze and update relational data using both the spreadsheet interface and SQL commands.
The Data Warehouse (DW) is considered as a collection of integrated, detailed, historical data, collected from different sources . DW is used to collect data designed to support management decision making. There are so many approaches in designing a data warehouse both in conceptual and logical design phases. The conceptual design approaches are dimensional fact model, multidimensional E/R model, starER model and object-oriented multidimensional model. And the logical design approaches are flat schema, star schema, fact constellation schema, galaxy schema and snowflake schema. In this paper we have focused on comparison of Dimensional Modelling AND E-R modelling in the Data Warehouse. Dimensional Modelling (DM) is most popular technique in data warehousing. In DM a model of tables and relations is used to optimize decision support query performance in relational databases. And conventional E-R models are used to remove redundancy in the data model, facilitate retrieval of individual records having certain critical identifiers, and optimize On-line Transaction Processing (OLTP) performance.
Visualizing Software Architecture Effectively in Service DescriptionSanjoy Kumar Roy
The document discusses service description, which is a software architecture document that visually diagrams components, dependencies, and communication using diagrams and text. It provides key details for understanding the problem and solution domains. The document recommends creating context, container, component, and deployment diagrams to illustrate the software architecture at different levels. These diagrams help technical and non-technical audiences understand the system. Sample diagrams are provided for each type to demonstrate how they can be used to visualize the architecture and answer common questions about the system. Maintaining an up-to-date service description is important to keep it relevant over time.
The document discusses parallel computing over the past 25 years and challenges for using multicore chips in the next decade. It aims to provide context to scale applications effectively to 32-1024 cores. Key challenges include expressing inherent application parallelism while enabling efficient mapping to hardware through programming models and runtime systems. Future work includes developing methods to restore lost parallelism information and tradeoffs between programming effort, generality and performance.
The document discusses using MapReduce for a sequential web access-based recommendation system. It explains how web server logs could be mapped to create a pattern tree showing frequent sequences of accessed web pages. When making recommendations for a user, their access pattern would be compared to patterns in the tree to find matching branches to suggest. MapReduce is well-suited for this because it can efficiently process and modify the large, dynamic tree structure across many machines in a fault-tolerant way.
Object databases store objects rather than data types like numbers and strings. Objects have attributes that define their characteristics and methods that define their behaviors. Relational databases store data in normalized tables with rows and columns. Object databases are suited for complex data and relationships, while relational databases work better for large volumes of simple transactional data.
The Entity-Attribute-Value model is a semi-structured data model where each attribute-value pair describing an entity is stored as a single row. This flexible model allows for an unlimited number of attributes per entity.
chapter 2-DATABASE SYSTEM CONCEPTS AND architecture [Autosaved].pdfMisganawAbeje1
This document discusses database system concepts and architecture. It covers topics such as data models, schemas, and the three-schema architecture. Specifically, it describes different data models including hierarchical, network, relational, entity-relationship, and object-oriented models. It also distinguishes between database schemas, which define the structure and constraints of a database, and database instances, which represent the actual stored data.
The document provides an overview and agenda for a presentation on SQL Server Denali business intelligence (BI) capabilities. Key points include:
- PowerPivot and Excel Services allow self-service BI through a familiar Excel interface while leveraging Analysis Services for storage and collaboration features.
- Analysis Services Tabular Mode is the server implementation of PowerPivot, supporting partitions, roles and other enterprise features.
- Project "Crescent" provides ad hoc reporting directly against PowerPivot and Analysis Services Tabular models through a browser-based, Excel-like interface in Silverlight.
- Master Data Services and Data Quality Services provide master data management and data cleansing capabilities to support better data quality for BI initiatives.
This document discusses spreadsheets and their continued success and popularity. It outlines some key advantages of spreadsheets including their versatility, ease of use, large support community, and real-time user interface. However, it also notes some limitations and pain points with spreadsheets regarding collaboration, integration with other applications, workflows, architecture, visualization, and using spreadsheets to build robust applications. The document then presents some potential solutions and new spreadsheet tools that could address these pain points to improve upon the traditional spreadsheet.
Lumina's Analytica software allows users to create complex business models and simulations visually, without using spreadsheets or code. It supports probabilistic modeling, scenario analysis, and collaboration between managers and analysts. Key benefits include intuitive visual modeling, live testing of assumptions, and validation of decisions. While mastering Analytica is challenging, it handles specialized modeling better than other tools and helps communicate complex analyses. Analytica supports advanced quantitative operations and simulations but could provide more templates and examples for novice users.
Time series-analysis-using-an-event-streaming-platform -_v3_finalconfluent
(1) The document discusses using an event streaming platform like Apache Kafka for advanced time series analysis (TSA). Typical processing patterns are described for converting raw data into time series and reconstructing graphs and networks from time series data.
(2) A challenge discussed is integrating data streams, experiments, and decision making. The document argues that stream processing using Kafka is better suited than batch processing for real-time applications and iterative research projects.
(3) The document then covers approaches for TSA and network analysis using Kafka, including creating time series from event streams, creating graphs from time series pairs, and architectures using reusable building blocks for complex stream processing.
Bivariate maps show two themes on the same map. The graphic marks used to represent the themes may be different, as with proportional symbols on a choropleth map, or they may be the same. Bivariate choropleth and bivariate point symbol maps fall into the latter category. Although ArcGIS does not have any out of the box tools to make these same-symbol bivariate maps, in this presentation I introduce a new set of tools that can be used to ease the compilation of these maps. Combined with standard tools, it is now easier and faster to make these bivariate maps in ArcGIS.
Visualizing Data in a Web Browser with Cesium ion & FMESafe Software
Learn how to bring data into Cesium ion with FME. Get tips for pre-processing data and turning it into robust 3D objects for visualization in a web browser.
This webinar will cover:
1. What is Cesium ion? (Presented by Cesium)
2. How do FME and Cesium work together?
3. Demo: How to prepare your data for Cesium ion with FME.
Plus, preview new functionality coming in FME 2019.0!
This presentation shows the general methodology of porting a JEE web application to GigaSpaces XAP and the lessons learned from the process of porting the Spring PetClinic sample application to the GigaSpaces platform
Cascade is crisis management software that allows users to:
1. Prepare for crises by mapping risks, infrastructure, and response plans over time.
2. Conduct exercises and training simulations to test response plans.
3. Manage crises in real-time by monitoring events, impacts, and coordinating responses.
4. Review crises by evaluating responses, impacts, and providing feedback.
It is a web-based tool that can be accessed from any device and allows multiple simultaneous users to collaborate on crisis mapping.
This document contains the resume of Minhhue Khuu. It summarizes his education, including obtaining a B.S. in Electrical and Computer Engineering from the University of Washington, as well as an Associate's degree from North Seattle College. It also outlines his coursework, skills in programming languages and hardware, work experience as an IT analyst, projects including a home security system and airplane booking service, and additional details.
This document discusses different types of queries in ArcGIS including attribute queries, location queries, and data driven page queries. Attribute queries extract features from a layer based on attribute values while location queries select features based on their spatial relationship to other features. Data driven page queries allow features to be shown or hidden on a map based on their associated page number from an index layer. Examples and demonstrations of each query type are provided.
This document discusses various modeling techniques used in structured systems analysis, including:
1) Data flow diagrams and system flowcharts are used to model system functions and data flows.
2) Entity-relationship diagrams and data dictionaries are used to model stored data and define data elements.
3) The purpose of these modeling techniques is to provide precise, understandable definitions of the system to both users and developers.
CloudAnt is a database as a service that provides components for building DBaaS applications. Documents are stored as JSON and accessed via API. Documents can represent single entities with related data embedded. MapReduce is used to implement views and relations without large single documents. Replication manages consistency across leader and replica nodes. Attachments can be included inline or referenced in documents.
A Tableau course is an educational program or training that is designed to teach individuals how to use Tableau, a popular data visualization and business intelligence software.
For more information visit: https://datamites.com/tableau-course-training-bangalore/
This document describes DATA-SPREAD, a system that unifies databases and spreadsheets. It discusses the challenges in combining these two disparate data models. DATA-SPREAD retains the ease-of-use of spreadsheets while adding database capabilities like scalability and powerful SQL queries. It uses a backend database (Postgres) while maintaining a spreadsheet frontend. The demonstration will allow users to interactively analyze and update relational data using both the spreadsheet interface and SQL commands.
The Data Warehouse (DW) is considered as a collection of integrated, detailed, historical data, collected from different sources . DW is used to collect data designed to support management decision making. There are so many approaches in designing a data warehouse both in conceptual and logical design phases. The conceptual design approaches are dimensional fact model, multidimensional E/R model, starER model and object-oriented multidimensional model. And the logical design approaches are flat schema, star schema, fact constellation schema, galaxy schema and snowflake schema. In this paper we have focused on comparison of Dimensional Modelling AND E-R modelling in the Data Warehouse. Dimensional Modelling (DM) is most popular technique in data warehousing. In DM a model of tables and relations is used to optimize decision support query performance in relational databases. And conventional E-R models are used to remove redundancy in the data model, facilitate retrieval of individual records having certain critical identifiers, and optimize On-line Transaction Processing (OLTP) performance.
Visualizing Software Architecture Effectively in Service DescriptionSanjoy Kumar Roy
The document discusses service description, which is a software architecture document that visually diagrams components, dependencies, and communication using diagrams and text. It provides key details for understanding the problem and solution domains. The document recommends creating context, container, component, and deployment diagrams to illustrate the software architecture at different levels. These diagrams help technical and non-technical audiences understand the system. Sample diagrams are provided for each type to demonstrate how they can be used to visualize the architecture and answer common questions about the system. Maintaining an up-to-date service description is important to keep it relevant over time.
The document discusses parallel computing over the past 25 years and challenges for using multicore chips in the next decade. It aims to provide context to scale applications effectively to 32-1024 cores. Key challenges include expressing inherent application parallelism while enabling efficient mapping to hardware through programming models and runtime systems. Future work includes developing methods to restore lost parallelism information and tradeoffs between programming effort, generality and performance.
The document discusses using MapReduce for a sequential web access-based recommendation system. It explains how web server logs could be mapped to create a pattern tree showing frequent sequences of accessed web pages. When making recommendations for a user, their access pattern would be compared to patterns in the tree to find matching branches to suggest. MapReduce is well-suited for this because it can efficiently process and modify the large, dynamic tree structure across many machines in a fault-tolerant way.
Object databases store objects rather than data types like numbers and strings. Objects have attributes that define their characteristics and methods that define their behaviors. Relational databases store data in normalized tables with rows and columns. Object databases are suited for complex data and relationships, while relational databases work better for large volumes of simple transactional data.
The Entity-Attribute-Value model is a semi-structured data model where each attribute-value pair describing an entity is stored as a single row. This flexible model allows for an unlimited number of attributes per entity.
chapter 2-DATABASE SYSTEM CONCEPTS AND architecture [Autosaved].pdfMisganawAbeje1
This document discusses database system concepts and architecture. It covers topics such as data models, schemas, and the three-schema architecture. Specifically, it describes different data models including hierarchical, network, relational, entity-relationship, and object-oriented models. It also distinguishes between database schemas, which define the structure and constraints of a database, and database instances, which represent the actual stored data.
The document provides an overview and agenda for a presentation on SQL Server Denali business intelligence (BI) capabilities. Key points include:
- PowerPivot and Excel Services allow self-service BI through a familiar Excel interface while leveraging Analysis Services for storage and collaboration features.
- Analysis Services Tabular Mode is the server implementation of PowerPivot, supporting partitions, roles and other enterprise features.
- Project "Crescent" provides ad hoc reporting directly against PowerPivot and Analysis Services Tabular models through a browser-based, Excel-like interface in Silverlight.
- Master Data Services and Data Quality Services provide master data management and data cleansing capabilities to support better data quality for BI initiatives.
This document discusses spreadsheets and their continued success and popularity. It outlines some key advantages of spreadsheets including their versatility, ease of use, large support community, and real-time user interface. However, it also notes some limitations and pain points with spreadsheets regarding collaboration, integration with other applications, workflows, architecture, visualization, and using spreadsheets to build robust applications. The document then presents some potential solutions and new spreadsheet tools that could address these pain points to improve upon the traditional spreadsheet.
Lumina's Analytica software allows users to create complex business models and simulations visually, without using spreadsheets or code. It supports probabilistic modeling, scenario analysis, and collaboration between managers and analysts. Key benefits include intuitive visual modeling, live testing of assumptions, and validation of decisions. While mastering Analytica is challenging, it handles specialized modeling better than other tools and helps communicate complex analyses. Analytica supports advanced quantitative operations and simulations but could provide more templates and examples for novice users.
Time series-analysis-using-an-event-streaming-platform -_v3_finalconfluent
(1) The document discusses using an event streaming platform like Apache Kafka for advanced time series analysis (TSA). Typical processing patterns are described for converting raw data into time series and reconstructing graphs and networks from time series data.
(2) A challenge discussed is integrating data streams, experiments, and decision making. The document argues that stream processing using Kafka is better suited than batch processing for real-time applications and iterative research projects.
(3) The document then covers approaches for TSA and network analysis using Kafka, including creating time series from event streams, creating graphs from time series pairs, and architectures using reusable building blocks for complex stream processing.
This document discusses how the Unified Modeling Language (UML) and UML Profile for Data Modeling can be used to map object-oriented application models to relational data models. Key mappings include:
- Packages map to database schemas
- Classes map to tables
- Attributes map to columns
- Associations between classes map to relationships between tables, such as one-to-one, one-to-many, and many-to-many relationships.
- Generalization relationships between classes map to identifying relationships between tables.
This document discusses serverless applications and event management. It compares events to messages and different event streaming services like Event Grid, Event Hubs and Service Bus. It also provides examples of using GraphQL with serverless functions to handle events and real-time updates through subscriptions.
Hear Ryan Millay, IBM Cloudant software development manager, discuss what you need to consider when moving from world of relational databases to a NoSQL document store.
You'll learn about the key differences between relational databases and JSON document stores like Cloudant, as well as how to dodge the pitfalls of migrating from a relational database to NoSQL.
This presentation covers several aspects of modeling data and domains with a graph database like Neo4j. The graph data model allows high fidelity modeling. Using the first class relationships of the graph model allow to use much higher forms of normalization than you would use in a relational database.
Video here: https://vimeo.com/67371996
(1) The document discusses using an event streaming platform like Apache Kafka for advanced time series analysis (TSA). Typical processing patterns are described for converting raw data into time series and reconstructing graphs and networks from time series data.
(2) A challenge discussed is integrating data streams, experiments, and decision making. The document argues that stream processing using Kafka is better suited than batch processing for real-time business in changing environments and iterative research projects.
(3) The document describes approaches for performing time series analysis and network analysis using Kafka to create time series from event streams and graphs from time series pairs. A simplified architecture for complex streaming analytics using reusable building blocks is presented.
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
3. • People tend to intuitively visualise concepts as graphs which do not translate nicely to a
tabular structure.
• Graph databases are often designed for low-latency performance, which can make them
a better choice for certain applications, such as recommendation engines, especially at
scale.
• There are some questions (for example path analysis) that are difficult to answer when
using a relational database, but that are easy to answer with a graph.
Graph DBs have some key
advantages over relational DBs
4. • In a relational event data model, each event is a record in a table or index.
• The table has as many columns or properties as there are facets to that event, eg user,
timestamp, URL, etc.
• There isn’t much scope for deviation from this basic model.
We know how to model events in a
table…
5. • We can model events as nodes with properties, related through [:NEXT] edges:
(PageView1)-[:NEXT]->(PageView2).
• We can model events as relationships, eg (User)-[:VIEWS]->(Page).
• We can mix and match different methods.
• It’s not obvious which model is ‘the right one’: if there even is such a thing.
… but modelling events as a graph
is relatively unexplored
6. • In a relational database, you’d always use the same query to get specific properties of an
event:
SELECT user_id, page_url FROM events;
• However, in a graph database your query syntax depends on whether those dimensions have
been modelled as nodes, relationships, or properties of nodes or relationships:
MATCH (u:User)-[:VIEWS]->(p:Page)
RETURN u.id, p.url;
MATCH (e:Event)
RETURN e.user_id, e.page_url;
Choosing the model dictates what
queries we can run
7. Taking an event-grammar approach
In the event-grammar model, an event is a
snapshot of a set of entities in time.
This model is already a graph, with nodes
representing the various entities and
relationships between the nodes.
However, when mapping this model to a
tabular structure, the roles of each entity
and the relationships between them are
lost to users without knowledge of the
model and the domain.
8. To make the roles and relationships
explicit, we have to interpret the
event
9. MATCH (u:User)-[r:VIEWS]->(p:Page)
WHERE u.email = 'alice@mail.com'
RETURN COUNT(r)
But what if we have more than just page views, eg also link clicks, downloads, form
submits, etc?
This model makes it hard to find all
events by the same user
10. The event graph approach
A popular option for modelling events in a
graph is to make each event a node that is
related to the event that happened immediately
before it and after it through
a NEXT / PREVIOUS relationship.
The event node then has
outgoing HAS relationships to all of its entities,
such as user nodes, context nodes, etc.
This is an easy model to do path analysis.
11. MATCH (u:User)<-[:HAS]-(e:Event)-[:HAS]—>(p:Page)
WHERE u.email = 'alice@mail.com'
RETURN COUNT(DISTINCT p)
We can still find all pages visited by the user, but we always have to add a ‘hop’ in the
query, because entities are related to each other only through the event node they belong
to.
The event-graph model makes other
queries harder
12. The ‘denormalised’ graph approach
There is also the option to “denormalise” the
data, ie represent the same data in different
ways.
An example would be a model where each
event is a node in a time series, with outgoing
relationships to all its entities, but there
are also relationships between the entities.
This adds complexity and redundancy to the
model but makes queries easier.
13. MATCH (u:User)-[:VIEWS]->(p:Page)
WHERE u.email = 'alice@mail.com'
RETURN COUNT(DISTINCT p)
MATCH p = (u:User)<-[:HAS]-(Event)-[:NEXT*1..5]—>(Event)
WHERE u.email = 'alice@mail.com'
RETURN p
Now we can easily write a variety of
queries
14. One key advantage is that any insight you glean from analysing the relationships of entities
in your events, can be readily attached to your existing data set.
How is modelling event-level data as
a graph valuable?
15. • Users log in with different
accounts
• On multiple devices
• Across multiple networks
To illustrate, here’s a popular use
case: an Identity Resolution Graph
16. You need to write extra code to:
• Check the Identity Graph for all aliases of a specific user / device / network;
• Fill in those aliases in SQL queries against your relational database;
• Union the results of those queries.
An identity graph is powerful but it
remains locked away from the rest
of your relational data
17. You will be able to easily:
• Find all events for a specific user / device / network;
• Build relationships that link all known aliases for this user / device / network to the same
events;
• Quickly discover all of the user / device / network history, regardless of which alias they
are using at the moment.
Contrast that with building the ID
graph on top of your existing event
graph
19. If you would like to explore how Snowplow can enable you to
take control of your data, and what that can make possible,
visit our product page, request a demo or get in touch.
Sign up to our mailing list and stay up-to-date with our new
releases and other news.