This document discusses how to use Google Trends for client optimization. It explains that Google Trends allows you to see trending topics, determine keyword popularity and strength, geo-target keywords to specific locations, and get ideas for content creation based on trends. Content creation managers and specialists can use Trend data to develop and implement strategies for creating relevant content and optimizing keywords.
This introductory level talk is about Apache Flink: a multi-purpose Big Data analytics framework leading a movement towards the unification of batch and stream processing in the open source.
With the many technical innovations it brings along with its unique vision and philosophy, it is considered the 4 G (4th Generation) of Big Data Analytics frameworks providing the only hybrid (Real-Time Streaming + Batch) open source distributed data processing engine supporting many use cases: batch, streaming, relational queries, machine learning and graph processing.
In this talk, you will learn about:
1. What is Apache Flink stack and how it fits into the Big Data ecosystem?
2. How Apache Flink integrates with Hadoop and other open source tools for data input and output as well as deployment?
3. Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark.
4. Who is using Apache Flink?
5. Where to learn more about Apache Flink?
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
These are the slides of my talk on June 30, 2015 at the first event of the Chicago Apache Flink meetup. Although most of the current buzz is about Apache Spark, the talk shows how Apache Flink offers the only hybrid open source (Real-Time Streaming + Batch) distributed data processing engine supporting many use cases: Real-Time stream processing, machine learning at scale, graph analytics and batch processing.
In these slides, you will find answers to the following questions: What is Apache Flink stack and how it fits into the Big Data ecosystem? How Apache Flink integrates with Apache Hadoop and other open source tools for data input and output as well as deployment? What is the architecture of Apache Flink? What are the different execution modes of Apache Flink? Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark? Who is using Apache Flink? Where to learn more about Apache Flink?
MemSQL 201: Advanced Tips and Tricks WebcastSingleStore
This document summarizes a webinar on advanced tips and tricks for MemSQL. It discusses the differences between rowstore and columnstore storage models and when each is best used. It also covers data ingestion using MemSQL Pipelines for real-time loading, data sharding and query tuning techniques like using reference tables. Additionally, it discusses monitoring memory usage, workload management using management views, and query optimization tools like analyzing and optimizing tables.
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...HostedbyConfluent
Does your organization struggle with updating of its Kafka Streams application? Releasing a new version of a Kafka Streams application can be challenging, especially if its state has to be preserved between releases. Consider these best-practices and architectural ideas to make this process smoother and improve your release process.
Having experienced accidental removal of change-log topics and needing to expand partitions, it is much easier to handle with some planning. With the proper planning, you can achieve easier application upgrades.
Key take-aways from the session include:
* How do minimize the rebuilding of the state-stores.
* How to change stream topologies without affecting the existing state stores.
* What you can do when you absolutely need to increase the number of partitions within your application.
* How to leveraging schemas for application releases.
* Measures to prevent data corruption, especially if Kafka is not only your system of record but also your source of truth.
* Techniques to support rolling back an application.
* The advantages of splitting apart a Kafka Streams application into multiple applications.
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
Graph Structure in the Web - Revisited. WWW2014 Web Science TrackChris Bizer
The document discusses research that revisits the graph structure of the web using a new large crawl from Common Crawl. It finds that the web has become more dense and connected over time, with the largest strongly connected component growing significantly. While previous research found power laws for in- and out-degrees, this data does not fit power laws and instead has heavy-tailed distributions. The shape of the bow-tie structure also depends on the specific crawl used. The authors provide the new crawl data and analysis to enable further research on the evolving structure of the web graph.
Talks about best practices and patterns on how to design an efficient cube in Kylin. Covers concepts like mandatory dimension, hierarchy dimension, derived dimension, incremental build, aggregation group etc.
This document discusses how to use Google Trends for client optimization. It explains that Google Trends allows you to see trending topics, determine keyword popularity and strength, geo-target keywords to specific locations, and get ideas for content creation based on trends. Content creation managers and specialists can use Trend data to develop and implement strategies for creating relevant content and optimizing keywords.
This introductory level talk is about Apache Flink: a multi-purpose Big Data analytics framework leading a movement towards the unification of batch and stream processing in the open source.
With the many technical innovations it brings along with its unique vision and philosophy, it is considered the 4 G (4th Generation) of Big Data Analytics frameworks providing the only hybrid (Real-Time Streaming + Batch) open source distributed data processing engine supporting many use cases: batch, streaming, relational queries, machine learning and graph processing.
In this talk, you will learn about:
1. What is Apache Flink stack and how it fits into the Big Data ecosystem?
2. How Apache Flink integrates with Hadoop and other open source tools for data input and output as well as deployment?
3. Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark.
4. Who is using Apache Flink?
5. Where to learn more about Apache Flink?
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
These are the slides of my talk on June 30, 2015 at the first event of the Chicago Apache Flink meetup. Although most of the current buzz is about Apache Spark, the talk shows how Apache Flink offers the only hybrid open source (Real-Time Streaming + Batch) distributed data processing engine supporting many use cases: Real-Time stream processing, machine learning at scale, graph analytics and batch processing.
In these slides, you will find answers to the following questions: What is Apache Flink stack and how it fits into the Big Data ecosystem? How Apache Flink integrates with Apache Hadoop and other open source tools for data input and output as well as deployment? What is the architecture of Apache Flink? What are the different execution modes of Apache Flink? Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark? Who is using Apache Flink? Where to learn more about Apache Flink?
MemSQL 201: Advanced Tips and Tricks WebcastSingleStore
This document summarizes a webinar on advanced tips and tricks for MemSQL. It discusses the differences between rowstore and columnstore storage models and when each is best used. It also covers data ingestion using MemSQL Pipelines for real-time loading, data sharding and query tuning techniques like using reference tables. Additionally, it discusses monitoring memory usage, workload management using management views, and query optimization tools like analyzing and optimizing tables.
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...HostedbyConfluent
Does your organization struggle with updating of its Kafka Streams application? Releasing a new version of a Kafka Streams application can be challenging, especially if its state has to be preserved between releases. Consider these best-practices and architectural ideas to make this process smoother and improve your release process.
Having experienced accidental removal of change-log topics and needing to expand partitions, it is much easier to handle with some planning. With the proper planning, you can achieve easier application upgrades.
Key take-aways from the session include:
* How do minimize the rebuilding of the state-stores.
* How to change stream topologies without affecting the existing state stores.
* What you can do when you absolutely need to increase the number of partitions within your application.
* How to leveraging schemas for application releases.
* Measures to prevent data corruption, especially if Kafka is not only your system of record but also your source of truth.
* Techniques to support rolling back an application.
* The advantages of splitting apart a Kafka Streams application into multiple applications.
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
Graph Structure in the Web - Revisited. WWW2014 Web Science TrackChris Bizer
The document discusses research that revisits the graph structure of the web using a new large crawl from Common Crawl. It finds that the web has become more dense and connected over time, with the largest strongly connected component growing significantly. While previous research found power laws for in- and out-degrees, this data does not fit power laws and instead has heavy-tailed distributions. The shape of the bow-tie structure also depends on the specific crawl used. The authors provide the new crawl data and analysis to enable further research on the evolving structure of the web graph.
Talks about best practices and patterns on how to design an efficient cube in Kylin. Covers concepts like mandatory dimension, hierarchy dimension, derived dimension, incremental build, aggregation group etc.
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
NoSQL includes a wide range of different database technologies and were developed as a result of surging volume of data stored. Relational databases are not capable of coping with this huge volume and faces agility challenges. This is where NoSQL databases have come in to play and are popular because of their features. The session covers the following topics to help you choose the right NoSQL databases:
Traditional databases
Challenges with traditional databases
CAP Theorem
NoSQL to the rescue
A BASE system
Choose the right NoSQL database
The relationships between data sets matter. Discovering, analyzing, and learning those relationships is a central part to expanding our understand, and is a critical step to being able to predict and act upon the data. Unfortunately, these are not always simple or quick tasks.
To help the analyst we introduce RAPIDS, a collection of open-source libraries, incubated by NVIDIA and focused on accelerating the complete end-to-end data science ecosystem. Graph analytics is a critical piece of the data science ecosystem for processing linked data, and RAPIDS is pleased to offer cuGraph as our accelerated graph library.
Simply accelerating algorithms only addressed a portion of the problem. To address the full problem space, RAPIDS cuGraph strives to be feature-rich, easy to use, and intuitive. Rather than limiting the solution to a single graph technology, cuGraph supports Property Graphs, Knowledge Graphs, Hyper-Graphs, Bipartite graphs, and the basic directed and undirected graph.
A Python API allows the data to be manipulated as a DataFrame, similar and compatible with Pandas, with inputs and outputs being shared across the full RAPIDS suite, for example with the RAPIDS machine learning package, cuML.
This talk will present an overview of RAPIDS and cuGraph. Discuss and show examples of how to manipulate and analyze bipartite and property graph, plus show how data can be shared with machine learning algorithms. The talk will include some performance and scalability metrics. Then conclude with a preview of upcoming features, like graph query language support, and the general RAPIDS roadmap.
Presented at JavaOne 2013, Tuesday September 24.
"Data Modeling Patterns" co-created with Ian Robinson.
"Pitfalls and Anti-Patterns" created by Ian Robinson.
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
Flink Forward San Francisco 2022.
Probably everyone who has written stateful Apache Flink applications has used one of the fault-tolerant keyed state primitives ValueState, ListState, and MapState. With RocksDB, however, retrieving and updating items comes at an increased cost that you should be aware of. Sometimes, these may not be avoidable with the current API, e.g., for efficient event-time stream-sorting or streaming joins where you need to iterate one or two buffered streams in the right order. With FLIP-220, we are introducing a new state primitive: BinarySortedMultiMapState. This new form of state offers you to (a) efficiently store lists of values for a user-provided key, and (b) iterate keyed state in a well-defined sort order. Both features can be backed efficiently by RocksDB with a 2x performance improvement over the current workarounds. This talk will go into the details of the new API and its implementation, present how to use it in your application, and talk about the process of getting it into Flink.
by
Nico Kruber
This document discusses Looker Studio, a data visualization tool. It recommends Looker Studio for beginners because it is cloud-based, requires no installation, allows login with Gmail, and has a free tier and helpful community. The document provides an example of building a visualization using sample book data from a Google Sheet. It demonstrates some basic visualization features in Looker Studio and encourages the reader to try creating their own visualization with sample data from their work or personal finances.
PostgreSQL (or Postgres) began its life in 1986 as POSTGRES, a research project of the University of California at Berkeley.
PostgreSQL isn't just relational, it's object-relational.it's object-relational. This gives it some advantages over other open source SQL databases like MySQL, MariaDB and Firebird.
Data Lineage, Property Based Testing & Neo4j Neo4j
Neo4j can be used to create data lineage graphs that track how data changes through transformations and processes. These graphs make it easy to identify errors, evaluate impacts, and improve communication by showing the relationships between data. More than 90% of algorithms can be modeled as graphs, and graphs are well-suited to hosting process chains with relationships that can be filtered. The document demonstrates how Neo4j can be used to build a data lineage graph from metadata and answer questions about where restricted data became exposed.
This document provides an overview of the Hadoop MapReduce Fundamentals course. It discusses what Hadoop is, why it is used, common business problems it can address, and companies that use Hadoop. It also outlines the core parts of Hadoop distributions and the Hadoop ecosystem. Additionally, it covers common MapReduce concepts like HDFS, the MapReduce programming model, and Hadoop distributions. The document includes several code examples and screenshots related to Hadoop and MapReduce.
Kotlin delegates in practice - Kotlin Everywhere StockholmFabio Collini
The lazy delegate is probably the most famous Kotlin delegate, it’s easy to use and can be really useful. However delegation is a concept that can be used in many other ways in Kotlin. A delegate can be declared at two levels:
* a delegated property allows changing the way the property is managed
* an interface can be implemented delegating the methods to another object
In this talk we’ll see many practical examples to show how to leverage standard delegates and how to create new ones to improve the quality of our code and to avoid duplication.
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
This talk shares experiences from deploying and tuning Flink steam processing applications for very large scale. We share lessons learned from users, contributors, and our own experiments about running demanding streaming jobs at scale. The talk will explain what aspects currently render a job as particularly demanding, show how to configure and tune a large scale Flink job, and outline what the Flink community is working on to make the out-of-the-box for experience as smooth as possible. We will, for example, dive into - analyzing and tuning checkpointing - selecting and configuring state backends - understanding common bottlenecks - understanding and configuring network parameters
Time Series Analysis with Spark by Sandy RyzaSpark Summit
The document contains examples of time series data in various formats:
- Observations with a timestamp, key, and value
- Instants with timestamps and values for keys A, B, and C
- A time series DataFrame with a DateTimeIndex and values for keys A, B, and C
It also shows examples of working with time series data in Spark using TimeSeriesRDDs, including slicing data, filling in missing values, removing serial correlations, and fitting ARIMA and GARCH models.
Google was founded in 1998 as a search engine called BackRub by Larry Page and Sergey Brin as a research project at Stanford University. It was later renamed Google, inspired by the mathematical term 'googol' meaning 1 followed by 100 zeros. Google provides popular services like Google Search, Gmail, Google Maps, Google Photos, YouTube, Android, and more. It also focuses on hardware products like Google Pixel smartphones, Google Home speakers, and other devices. Google's services handle over a billion search queries each day and include Gmail, Google Docs, Google Drive, Google Translate, and more.
Aljoscha Krettek is the PMC chair of Apache Flink and Apache Beam, and co-founder of data Artisans. Apache Flink is an open-source platform for distributed stream and batch data processing. It allows for stateful computations over data streams in real-time and historically. Flink supports batch and stream processing using APIs like DataSet and DataStream. Data Artisans originated Flink and provides an application platform powered by Flink and Kubernetes for building stateful stream processing applications.
Gurpreet Singh from Microsoft gave a talk on scaling Python for data analysis and machine learning using DASK and Apache Spark. He discussed the challenges of scaling the Python data stack and compared options like DASK, Spark, and Spark MLlib. He provided examples of using DASK and PySpark DataFrames for parallel processing and showed how DASK-ML can be used to parallelize Scikit-Learn models. Distributed deep learning with tools like Project Hydrogen was also covered.
Family tree of data – provenance and neo4jM. David Allen
The document discusses using Neo4j, a graph database, to store and query provenance data. Some key points:
- Storing provenance in a relational database requires complex SQL and pushes graph operations into code, hurting performance on graph queries.
- Neo4j uses the Cypher query language which allows declarative graph queries without imperative code.
- Example Cypher queries are provided to demonstrate retrieving paths and relationships in a provenance graph.
- While graph databases provide better performance for graph queries, they have limitations for certain bulk scans compared to relational databases. Proper graph design is important.
GPORCA is query optimizer used inside Greenplum database, the first open source MPP solution based on PostgreSQL.
These are slides presented at the PGConf Seattle 2017. It introduced the internals of GPORCA, and provide OSS developers context to contribute back to the project.
3. Relationships Matter: Using Connected Data for Better Machine LearningNeo4j
The document discusses how graph databases and graph data science can be used to enhance machine learning models by incorporating relationship data. It provides examples of how organizations are using Neo4j's graph data science platform to improve predictive models in areas like fraud detection, health outcomes, and supply chain reliability. The platform includes over 50 graph algorithms, graph-native machine learning workflows, and the ability to train, apply, and manage predictive models on graph data.
Educational slides on TRACLUS, an algorithm for clustering trajectory data created by Jae-Gil Lee, Jiawei Han and Kyu-Young Wang, published on SIGMOD’07.
http://web.engr.illinois.edu/~hanj/pdf/sigmod07_jglee.pdf
The document discusses asynchronous programming and different approaches to handling asynchronous operations. It covers asynchronous programming concepts like asynchronous functions, asynchronous callbacks, and asynchronous events. It then describes different asynchronous programming models including synchronous, asynchronous programming model (APM), event-based asynchronous pattern (EAP), task-based asynchronous pattern (TAP), and async/await. Code examples are provided to illustrate each approach.
The document discusses an overview presentation on Apache NiFi given by Timothy Spann. The presentation covered what NiFi is, how to install it, its terminology, user interface, extensibility, and ecosystem. It also included a demonstration of how to add a processor for data intake within 1 minute. The presentation was part of a larger meetup event on the future of data.
netElastic is a software developer that offers virtual broadband network gateway (vBNG) and routing products. Their vBNG can route at up to 360Gbps on x86 servers, providing an alternative to physical routers. As a virtual product, the vBNG provides benefits like agility, flexibility, easy upgrades and scalability at a lower total cost of ownership. It supports subscriber management services and carrier network protocols to function as an access services router at the edge of broadband networks.
This document provides information on various network infrastructure components located across different geographic clusters. It lists specific equipment identifiers and locations for IP Cores, IP RAN equipment, aggregators and pre-aggregators located in Kepong, Subang Hitech, and other areas. Connections between these components and various core networks are also indicated.
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
NoSQL includes a wide range of different database technologies and were developed as a result of surging volume of data stored. Relational databases are not capable of coping with this huge volume and faces agility challenges. This is where NoSQL databases have come in to play and are popular because of their features. The session covers the following topics to help you choose the right NoSQL databases:
Traditional databases
Challenges with traditional databases
CAP Theorem
NoSQL to the rescue
A BASE system
Choose the right NoSQL database
The relationships between data sets matter. Discovering, analyzing, and learning those relationships is a central part to expanding our understand, and is a critical step to being able to predict and act upon the data. Unfortunately, these are not always simple or quick tasks.
To help the analyst we introduce RAPIDS, a collection of open-source libraries, incubated by NVIDIA and focused on accelerating the complete end-to-end data science ecosystem. Graph analytics is a critical piece of the data science ecosystem for processing linked data, and RAPIDS is pleased to offer cuGraph as our accelerated graph library.
Simply accelerating algorithms only addressed a portion of the problem. To address the full problem space, RAPIDS cuGraph strives to be feature-rich, easy to use, and intuitive. Rather than limiting the solution to a single graph technology, cuGraph supports Property Graphs, Knowledge Graphs, Hyper-Graphs, Bipartite graphs, and the basic directed and undirected graph.
A Python API allows the data to be manipulated as a DataFrame, similar and compatible with Pandas, with inputs and outputs being shared across the full RAPIDS suite, for example with the RAPIDS machine learning package, cuML.
This talk will present an overview of RAPIDS and cuGraph. Discuss and show examples of how to manipulate and analyze bipartite and property graph, plus show how data can be shared with machine learning algorithms. The talk will include some performance and scalability metrics. Then conclude with a preview of upcoming features, like graph query language support, and the general RAPIDS roadmap.
Presented at JavaOne 2013, Tuesday September 24.
"Data Modeling Patterns" co-created with Ian Robinson.
"Pitfalls and Anti-Patterns" created by Ian Robinson.
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
Flink Forward San Francisco 2022.
Probably everyone who has written stateful Apache Flink applications has used one of the fault-tolerant keyed state primitives ValueState, ListState, and MapState. With RocksDB, however, retrieving and updating items comes at an increased cost that you should be aware of. Sometimes, these may not be avoidable with the current API, e.g., for efficient event-time stream-sorting or streaming joins where you need to iterate one or two buffered streams in the right order. With FLIP-220, we are introducing a new state primitive: BinarySortedMultiMapState. This new form of state offers you to (a) efficiently store lists of values for a user-provided key, and (b) iterate keyed state in a well-defined sort order. Both features can be backed efficiently by RocksDB with a 2x performance improvement over the current workarounds. This talk will go into the details of the new API and its implementation, present how to use it in your application, and talk about the process of getting it into Flink.
by
Nico Kruber
This document discusses Looker Studio, a data visualization tool. It recommends Looker Studio for beginners because it is cloud-based, requires no installation, allows login with Gmail, and has a free tier and helpful community. The document provides an example of building a visualization using sample book data from a Google Sheet. It demonstrates some basic visualization features in Looker Studio and encourages the reader to try creating their own visualization with sample data from their work or personal finances.
PostgreSQL (or Postgres) began its life in 1986 as POSTGRES, a research project of the University of California at Berkeley.
PostgreSQL isn't just relational, it's object-relational.it's object-relational. This gives it some advantages over other open source SQL databases like MySQL, MariaDB and Firebird.
Data Lineage, Property Based Testing & Neo4j Neo4j
Neo4j can be used to create data lineage graphs that track how data changes through transformations and processes. These graphs make it easy to identify errors, evaluate impacts, and improve communication by showing the relationships between data. More than 90% of algorithms can be modeled as graphs, and graphs are well-suited to hosting process chains with relationships that can be filtered. The document demonstrates how Neo4j can be used to build a data lineage graph from metadata and answer questions about where restricted data became exposed.
This document provides an overview of the Hadoop MapReduce Fundamentals course. It discusses what Hadoop is, why it is used, common business problems it can address, and companies that use Hadoop. It also outlines the core parts of Hadoop distributions and the Hadoop ecosystem. Additionally, it covers common MapReduce concepts like HDFS, the MapReduce programming model, and Hadoop distributions. The document includes several code examples and screenshots related to Hadoop and MapReduce.
Kotlin delegates in practice - Kotlin Everywhere StockholmFabio Collini
The lazy delegate is probably the most famous Kotlin delegate, it’s easy to use and can be really useful. However delegation is a concept that can be used in many other ways in Kotlin. A delegate can be declared at two levels:
* a delegated property allows changing the way the property is managed
* an interface can be implemented delegating the methods to another object
In this talk we’ll see many practical examples to show how to leverage standard delegates and how to create new ones to improve the quality of our code and to avoid duplication.
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
This talk shares experiences from deploying and tuning Flink steam processing applications for very large scale. We share lessons learned from users, contributors, and our own experiments about running demanding streaming jobs at scale. The talk will explain what aspects currently render a job as particularly demanding, show how to configure and tune a large scale Flink job, and outline what the Flink community is working on to make the out-of-the-box for experience as smooth as possible. We will, for example, dive into - analyzing and tuning checkpointing - selecting and configuring state backends - understanding common bottlenecks - understanding and configuring network parameters
Time Series Analysis with Spark by Sandy RyzaSpark Summit
The document contains examples of time series data in various formats:
- Observations with a timestamp, key, and value
- Instants with timestamps and values for keys A, B, and C
- A time series DataFrame with a DateTimeIndex and values for keys A, B, and C
It also shows examples of working with time series data in Spark using TimeSeriesRDDs, including slicing data, filling in missing values, removing serial correlations, and fitting ARIMA and GARCH models.
Google was founded in 1998 as a search engine called BackRub by Larry Page and Sergey Brin as a research project at Stanford University. It was later renamed Google, inspired by the mathematical term 'googol' meaning 1 followed by 100 zeros. Google provides popular services like Google Search, Gmail, Google Maps, Google Photos, YouTube, Android, and more. It also focuses on hardware products like Google Pixel smartphones, Google Home speakers, and other devices. Google's services handle over a billion search queries each day and include Gmail, Google Docs, Google Drive, Google Translate, and more.
Aljoscha Krettek is the PMC chair of Apache Flink and Apache Beam, and co-founder of data Artisans. Apache Flink is an open-source platform for distributed stream and batch data processing. It allows for stateful computations over data streams in real-time and historically. Flink supports batch and stream processing using APIs like DataSet and DataStream. Data Artisans originated Flink and provides an application platform powered by Flink and Kubernetes for building stateful stream processing applications.
Gurpreet Singh from Microsoft gave a talk on scaling Python for data analysis and machine learning using DASK and Apache Spark. He discussed the challenges of scaling the Python data stack and compared options like DASK, Spark, and Spark MLlib. He provided examples of using DASK and PySpark DataFrames for parallel processing and showed how DASK-ML can be used to parallelize Scikit-Learn models. Distributed deep learning with tools like Project Hydrogen was also covered.
Family tree of data – provenance and neo4jM. David Allen
The document discusses using Neo4j, a graph database, to store and query provenance data. Some key points:
- Storing provenance in a relational database requires complex SQL and pushes graph operations into code, hurting performance on graph queries.
- Neo4j uses the Cypher query language which allows declarative graph queries without imperative code.
- Example Cypher queries are provided to demonstrate retrieving paths and relationships in a provenance graph.
- While graph databases provide better performance for graph queries, they have limitations for certain bulk scans compared to relational databases. Proper graph design is important.
GPORCA is query optimizer used inside Greenplum database, the first open source MPP solution based on PostgreSQL.
These are slides presented at the PGConf Seattle 2017. It introduced the internals of GPORCA, and provide OSS developers context to contribute back to the project.
3. Relationships Matter: Using Connected Data for Better Machine LearningNeo4j
The document discusses how graph databases and graph data science can be used to enhance machine learning models by incorporating relationship data. It provides examples of how organizations are using Neo4j's graph data science platform to improve predictive models in areas like fraud detection, health outcomes, and supply chain reliability. The platform includes over 50 graph algorithms, graph-native machine learning workflows, and the ability to train, apply, and manage predictive models on graph data.
Educational slides on TRACLUS, an algorithm for clustering trajectory data created by Jae-Gil Lee, Jiawei Han and Kyu-Young Wang, published on SIGMOD’07.
http://web.engr.illinois.edu/~hanj/pdf/sigmod07_jglee.pdf
The document discusses asynchronous programming and different approaches to handling asynchronous operations. It covers asynchronous programming concepts like asynchronous functions, asynchronous callbacks, and asynchronous events. It then describes different asynchronous programming models including synchronous, asynchronous programming model (APM), event-based asynchronous pattern (EAP), task-based asynchronous pattern (TAP), and async/await. Code examples are provided to illustrate each approach.
The document discusses an overview presentation on Apache NiFi given by Timothy Spann. The presentation covered what NiFi is, how to install it, its terminology, user interface, extensibility, and ecosystem. It also included a demonstration of how to add a processor for data intake within 1 minute. The presentation was part of a larger meetup event on the future of data.
netElastic is a software developer that offers virtual broadband network gateway (vBNG) and routing products. Their vBNG can route at up to 360Gbps on x86 servers, providing an alternative to physical routers. As a virtual product, the vBNG provides benefits like agility, flexibility, easy upgrades and scalability at a lower total cost of ownership. It supports subscriber management services and carrier network protocols to function as an access services router at the edge of broadband networks.
This document provides information on various network infrastructure components located across different geographic clusters. It lists specific equipment identifiers and locations for IP Cores, IP RAN equipment, aggregators and pre-aggregators located in Kepong, Subang Hitech, and other areas. Connections between these components and various core networks are also indicated.
Huawei's overall business strategy is to create more connections through intelligent devices and ICT infrastructure. Huawei aims to enable digital transformation by combining ICT infrastructure with intelligent devices. This will allow all things to connect to the cloud and enable cloud-native operations and cloud-cloud interconnection.
This document summarizes Frank Yuan's presentation on broadband universal service for a better connected digital Cambodia. It discusses the need for better broadband connectivity globally and in emerging markets. It outlines supportive strategies and policies that are key to achieving connectivity goals by 2023, including universal service obligations. The document reviews ITU recommendations on digital infrastructure policy and regulation in Asia-Pacific related to strategies, spectrum, site infrastructure, and standards. It presents data on mobile site density in various countries and discusses Huawei's innovative rural coverage solutions. Finally, it emphasizes shared responsibility among governments, operators, and vendors to achieve better connectivity.
Tejas Networks Limited presented an investor presentation on its acquisition of Saankhya Labs. The key points are:
1) Tejas Networks will acquire 64.4% of shares in Saankhya Labs for Rs 283.94 crores to gain expertise in wireless communication, 5G, broadcast, and satellite technologies as well as semiconductor design.
2) Saankhya Labs will complement Tejas Networks' product portfolio and provide access to new customers.
3) The acquisition will accelerate Tejas Networks' R&D investments to develop a larger portfolio of telecom products and help achieve its vision of becoming a top global telecom product company.
Huawei's document discusses their next-generation Wi-Fi 6 solution. It addresses how Wi-Fi 6 solves inherent problems in traditional Wi-Fi applications and innovates standards. It also discusses how Wi-Fi 6 meets requirements for new applications requiring ultra-low latency and high-density connections. The document outlines how Huawei's Wi-Fi 6 solution can reduce costs by converging Wi-Fi and IoT networks, and provides industry-leading performance through innovations like smart antennas and effective power amplifiers. Finally, it discusses how Huawei builds a 'proactive O&M' platform to intelligently manage and optimize large-scale Wi-Fi 6 networks.
The document discusses 5G RAN equipment evolution, including:
1) 5G standard progress and forecasts, with initial 5G focusing on eMBB and future enhancements for uRLLC and mMTC.
2) Diverse solutions for different coverage scenarios, including macro cells using sub-6GHz and mmWave bands, and small cells for indoor and hotspot coverage.
3) The 5G RAN roadmap showing the planned development and releases of macro cell and small cell products supporting sub-6GHz and mmWave bands between 2018-2021.
Huawei's eLTE solution provides high-speed broadband trunking and other services for enterprise users. It uses LTE technology to address challenges in industries like transportation, energy, and government that currently rely on narrowband technologies with limited data capabilities. The solution offers multimedia trunking, ruggedized devices, network resilience features, and open interfaces for third-party integration. Huawei has over 60 commercial eLTE contracts worldwide and its solutions have been used in several "world first" industrial LTE applications.
1) The document discusses 5G and its rapid rollout globally, with over 60 countries having launched commercial 5G networks by 2020 and over 170 countries having released national digital strategies emphasizing 5G and AI.
2) It outlines challenges around 5G deployment and building the 5G business ecosystem, and presents Huawei's solutions to address technical challenges through continuous innovation and help operators succeed in 5G business through ecosystem construction.
3) Huawei is committed to long-term technology leadership through the largest R&D investment in the industry and has developed 10 leading 5G product solutions to empower operators with 5G.
This document outlines Orange's strategic plan called "Essentials2020". It discusses Orange's goals to invest €15 billion in its networks between 2015-2018 to improve quality and coverage. It aims to have fiber available to 20 million homes in France by 2022. Orange plans to increase its digital capabilities and have 50% of customer interactions be digital by 2018. Financially, it expects revenues to be higher in 2018 compared to 2014 and maintain a dividend of at least €0.60 per share.
ZTE is a leader in 5G innovation and standardization. They have contributed key technologies to 5G standards including multi-user shared access (MUSA) and filter bank multi-carrier (FB-MC). ZTE has also demonstrated 5G technologies and products, achieving over 50Gbps using mmWave and showing MUSA can enable tens of millions of connections. ZTE is working with mobile operators globally on 5G tests and trials with the goal of earliest commercial deployment of 5G networks.