講師: George Chiu 邱志威, Sr. Industry Consultant, Teradata
Learn how Netflix engages customers by leveraging Teradata as a critical component of its data and analytics platform to create a data-driven, customer-focused business.
講師: Bob Yin, Senior Product Specialist, Informatica
These Informatica Cloud offerings are pre-built packages for quick time-to-value for customers looking to fast-track cloud data management initiatives. For example, customers can quickly kick start a new Amazon Redshift data warehouse project and use Informatica Cloud Connector for Amazon Redshift to load it with meaningful connected data from cloud sources such as Salesforce.com or on-premises sources such as relational databases -- all within hours, not months.
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Lucidworks
This document summarizes S&P Global's use of Solr for search capabilities across their large datasets. It discusses how S&P Global indexes over 50 million documents into Solr monthly and handles over 5 million queries per week. It outlines challenges faced with an on-premise Solr deployment and how migrating to Solr Cloud helped address issues like performance, availability, and scalability. Next steps discussed include improving relevancy through data science, continuing to leverage new Solr features, and exploring ways to integrate machine learning into search capabilities.
講師: Xiaoyong Han, Solution Architect, AWS
Data collection and storage is a primary challenge for any big data architecture. In this webinar, gain a thorough understanding of AWS solutions for data collection and storage, and learn architectural best practices for applying those solutions to your projects. This session will also include a discussion of popular use cases and reference architectures. In this webinar, you will learn:
• Overview of the different types of data that customers are handling to drive high-scale workloads on AWS, and how to choose the best approach for your workload • Optimization techniques that improve performance and reduce the cost of data ingestion • Leveraging Amazon S3, Amazon DynamoDB, and Amazon Kinesis for storage and data collection
This document discusses Neo4j, a graph database management system. It provides an overview of Neo4j, including what it is, why graph databases are useful, examples of graph databases, use cases for graph databases, and how to use Neo4j. It also describes a messenger bot project that uses Neo4j to find bus routes in Yangon, and provides resources for learning more about Neo4j.
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...confluent
Pinterest moved 100TB of data from MySQL databases to S3 and Hadoop continuously using a new data pipeline. The pipeline uses Kafka to stream database change events in real-time. It incorporates periodic compaction to merge snapshots and deltas into a compact format with 15 minute latency. The new system provides reliability, scalability, and enables features like real-time search and recommendations.
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Spark Summit
In the race to invent multi-million dollar business opportunities with exclusive insights, data scientists and engineers are hampered by a multitude of challenges just to make one use case a reality – the need to ingest data from multiple sources, apply real-time analytics, build machine learning algorithms, and intermix different data processing models, all while navigating around their legacy data infrastructure that is just not up to the task. This need has created the demand for Virtual Analytics, where the complexities of disparate data and technology silos have been abstracted away, coupled with a powerful range of analytics and processing horsepower, all in one unified data platform. This talk describes how Databricks is powering this revolutionary new trend with Apache Spark.
講師: George Chiu 邱志威, Sr. Industry Consultant, Teradata
Learn how Netflix engages customers by leveraging Teradata as a critical component of its data and analytics platform to create a data-driven, customer-focused business.
講師: Bob Yin, Senior Product Specialist, Informatica
These Informatica Cloud offerings are pre-built packages for quick time-to-value for customers looking to fast-track cloud data management initiatives. For example, customers can quickly kick start a new Amazon Redshift data warehouse project and use Informatica Cloud Connector for Amazon Redshift to load it with meaningful connected data from cloud sources such as Salesforce.com or on-premises sources such as relational databases -- all within hours, not months.
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Lucidworks
This document summarizes S&P Global's use of Solr for search capabilities across their large datasets. It discusses how S&P Global indexes over 50 million documents into Solr monthly and handles over 5 million queries per week. It outlines challenges faced with an on-premise Solr deployment and how migrating to Solr Cloud helped address issues like performance, availability, and scalability. Next steps discussed include improving relevancy through data science, continuing to leverage new Solr features, and exploring ways to integrate machine learning into search capabilities.
講師: Xiaoyong Han, Solution Architect, AWS
Data collection and storage is a primary challenge for any big data architecture. In this webinar, gain a thorough understanding of AWS solutions for data collection and storage, and learn architectural best practices for applying those solutions to your projects. This session will also include a discussion of popular use cases and reference architectures. In this webinar, you will learn:
• Overview of the different types of data that customers are handling to drive high-scale workloads on AWS, and how to choose the best approach for your workload • Optimization techniques that improve performance and reduce the cost of data ingestion • Leveraging Amazon S3, Amazon DynamoDB, and Amazon Kinesis for storage and data collection
This document discusses Neo4j, a graph database management system. It provides an overview of Neo4j, including what it is, why graph databases are useful, examples of graph databases, use cases for graph databases, and how to use Neo4j. It also describes a messenger bot project that uses Neo4j to find bus routes in Yangon, and provides resources for learning more about Neo4j.
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...confluent
Pinterest moved 100TB of data from MySQL databases to S3 and Hadoop continuously using a new data pipeline. The pipeline uses Kafka to stream database change events in real-time. It incorporates periodic compaction to merge snapshots and deltas into a compact format with 15 minute latency. The new system provides reliability, scalability, and enables features like real-time search and recommendations.
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Spark Summit
In the race to invent multi-million dollar business opportunities with exclusive insights, data scientists and engineers are hampered by a multitude of challenges just to make one use case a reality – the need to ingest data from multiple sources, apply real-time analytics, build machine learning algorithms, and intermix different data processing models, all while navigating around their legacy data infrastructure that is just not up to the task. This need has created the demand for Virtual Analytics, where the complexities of disparate data and technology silos have been abstracted away, coupled with a powerful range of analytics and processing horsepower, all in one unified data platform. This talk describes how Databricks is powering this revolutionary new trend with Apache Spark.
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersLucidworks
This document discusses how SQL can be used in Lucidworks Fusion for various purposes like aggregating signals to compute relevance scores, ingesting and transforming data from various sources using Spark SQL, enabling self-service analytics through tools like Tableau and PowerBI, and running experiments to compare variants. It provides examples of using SQL for tasks like sessionization with window functions, joining multiple data sources, hiding complex logic in user-defined functions, and powering recommendations. The document recommends SQL in Fusion for tasks like analytics, data ingestion, machine learning, and experimentation.
講師: Ivan Cheng, Solution Architect, AWS
Join us for a series of introductory and technical sessions on AWS Big Data solutions. Gain a thorough understanding of what Amazon Web Services offers across the big data lifecycle and learn architectural best practices for applying those solutions to your projects.
We will kick off this technical seminar in the morning with an introduction to the AWS Big Data platform, including a discussion of popular use cases and reference architectures. In the afternoon, we will deep dive into Machine Learning and Streaming Analytics. We will then walk everyone through building your first Big Data application with AWS.
The structured streaming upgrade to Apache Spark and how enterprises can bene...Impetus Technologies
The adoption of Apache Spark to analyze data in real-time is increasing with its ability to handle sophisticated analytical requirements and a common framework for streaming and batch. However, most organizations are also looking for "true streaming" features like lower latency and the ability to process out-of-order data.
Structured Streaming, a new high-level API, introduced in Apache Spark 2.0 promises these and other enhancements to the Spark approach to streaming data processing.
In this webinar, Anand Venugopal (Product Head) and other technical experts from StreamAnalytix, speak about the promising developments in Apache Spark 2.0 and how organizations can leverage structured streaming to make timely and accurate decisions and stay competitive.
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Impetus Technologies
Apache Spark is one of the most popular Big Data frameworks today. It is fast becoming the de facto technology choice for stream processing, real-time analytics, data science and machine learning applications at scale. It has moved well beyond the early-adopter phase, is supported by a vibrant open source community and is enjoying accelerated adoption in enterprises.
Join our guest speaker from Forrester Research, VP & Principal Analyst, Mike Gualtieri and StreamAnalytix, Product Head, Anand Venugopal for a discussion on the trends and directions defining the growing importance of Apache Spark for stream processing, machine learning and other advanced data analytics applications.
The SAS Search Journey: Using AI to Move from Google to Lucidworks - Alex Fl...Lucidworks
1) SAS migrated their enterprise search from Google and another solution to Lucidworks Fusion to have a single search platform.
2) They encountered issues with the out of box configuration and content that impacted search relevance and ranking.
3) Through multiple iterations of configuration changes, indexing adjustments, and using AI/SAS tools to evaluate search terms, they improved relevance and the types of results returned for key search terms.
4) Future plans for the search include immediate indexing of new content, auto-suggest, spellcheck, and integrating search data into analytics dashboards. Lucidworks was chosen for its data analytics abilities, easy administration, and connectors.
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...Codemotion
Modern race cars produce lot of data, and all this in real time. In this presentation I will show you how data could be generated and used by various applications in the car, on the track or team head quarter. The demonstration will show how to move data using messaging systems like Apache Kafka, process the data using Apache Spark and use various storage technics: Distributed File System, NoSQL Database. This presentation is a great opportunity to see how to build a " near real time big data application". The code from this talk will be made available as open source.
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...SnapLogic
At last week's Strata + Hadoop World in San Jose, CA SnapLogic Chief Scientist Greg Benson talked to big data experts, data scientists and other enterprise IT leaders about the data lake and how SnapLogic comes into play with Hadoop-scale data integration.
Check out this presentation to learn how SnapLogic helps customers adopt Hadoop and automate data integration workflows.
To learn more, visit: www.snaplogic.com/big-data
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph DatabasesNeo4j
The document is an agenda for the Stockholm GraphDay event. It includes sessions on graph databases in a digital economy, graph use cases, a fraud detection use case using Neo4j, and training sessions. It encourages attendees to connect on social media using the event hashtag and provides the WiFi password.
Spark Summit EU 2015: Matei Zaharia keynoteDatabricks
2015 was a year of continued growth for Spark, with numerous additions to the core project and very fast growth of use cases across the industry. In this talk, I’ll look back at how the Spark community is has grown and changed in 2015, based on a large Apache Spark user survey conducted by Databricks. We see some interesting trends in the diversity of runtime environments (which are increasingly not just Hadoop); the types of applications run on Spark; and the types of users, now that features like R support and DataFrames are available in Spark. I’ll also cover the ongoing work in the upcoming releases of Spark to support new use cases.
This webinar focuses on the particular use case of graph databases in Network & IT-Management. This webinar is designed for people who work with Network Management at telecom companies or professionals within industries that handle and rely on complex networks.
We’ll start with an overview of Neo4j and Graph-thinking within Networks, explaining how Neworks are naturally modelled as graphs. We’ll explain how graph databases vastly help mitigate some of the major challenges the Network and Security Managers face on daily basis — including intrusions and other cyber crimes, performance optimization, outage simulations, fraud prevention and more.
Janus graph lookingbackwardreachingforwardDemai Ni
JanusGraph: Looking Backward and Reaching Forward - by Jason Plurad (@pluradj):
The JanusGraph project started at the Linux Foundation earlier this year, but it is not the new kid on the block. We'll start with a look at the origins and evolution of this open source graph database through the lens of a few IBM graph use cases. We'll discuss the new features in latest release of JanusGraph, and then take a look at future directions to explore together with the open community.
This document summarizes a live webinar about creating and querying a graph database of Olympic data. It describes loading data on athletes, countries, sports, events and medals from 1896-2012 into a Neo4j graph database. It then demonstrates several example queries of the Olympic graph, such as the number of sports per games, medals per country per sport, and athletes who medaled in multiple sports.
Detecting Mobile Malware with Apache Spark with David PryceDatabricks
“The ability to detect malware has needed to drastically change in the past few years away from traditional signature or list based techniques. Couple this with the rise of mobile device based attacks, where the scale of the data is predicted to be 60% of the internet in 2018*, our online lives will need Machine Learning (ML) and Data Science to ensure its security. At Wandera we have successfully implemented a malware detection (and classification) ML model at scale with the use of Apache Spark (MLib) and the PMML via OpenScoring paradigm.
In this talk we will touch on the training data and why we use Spark at all, the features we extract from mobile phone applications and how we then obtain our high accuracy scores in the cloud. At Wandera we have successfully implemented a Malware detection (and classification) ML model at scale with the use of Apache Spark (MLib) and the PMML via OpenScoring paradigm. *https://blog.cloudflare.com/our-predictions-for-2018/”
Accelerating Innovation with Unified Analytics with Ali GhodsiDatabricks
Today at the 10th Spark Summit, Databricks CEO & Co-founder revealed Databricks Serverless, a new initiative to offer serverless computing for complex data science and Apache Spark workloads. Databricks Serverless is the first product to offer a serverless API for Apache Spark, greatly simplifying and unifying data science and big data workloads for both end-users and DevOps.
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Spark Summit
In the race to invent multi-million dollar business opportunities with exclusive insights, data scientists and engineers are hampered by a multitude of challenges just to make one use case a reality – the need to ingest data from multiple sources, apply real-time analytics, build machine learning algorithms, and intermix different data processing models, all while navigating around their legacy data infrastructure that is just not up to the task. This need has created the demand for Virtual Analytics, where the complexities of disparate data and technology silos have been abstracted away, coupled with a powerful range of analytics and processing horsepower, all in one unified data platform. This talk describes how Databricks is powering this revolutionary new trend with Apache Spark.
DATA @ NFLX (Tableau Conference 2014 Presentation)Blake Irvine
I presented this at a 2014 Tableau Conference session with Albert Wong.
Netflix relies on data to make decisions ranging from buying and recommending content, to improving the streaming experience on devices.
This presentation shares our Big Data analytics architecture and the tools used to make data accessible throughout our business, focusing on how Tableau fits into our organization and why it aligns well with our culture.
Plongez au cœur de la recherche dans tous ses états.Elasticsearch
À l'instar de la plupart des entreprises modernes, vos équipes utilisent probablement plus de 10 applications hébergées dans le cloud chaque jour, mais passent aussi bien trop de temps à chercher les informations dont elles ont besoin dans ces outils. Grâce aux fonctionnalités prêtes à l'emploi d'Elastic Workplace Search, découvrez combien il est facile de mettre le contenu pertinent à portée de la main de vos équipes grâce à une recherche unifiée sur l'ensemble des applications qu'elles utilisent pour faire leur travail.
How Apache Spark Changed the Way We Hire People with Tomasz MagdanskiDatabricks
As big data technology matures, you’d think there would be more talent available to hire. Although the number of people interested and engaged in the big data world has dramatically increased, job demand is far ahead.
Companies like Google or Facebook have access to the best talent — thousands of engineers with PhDs from the best schools, which is why they are able to innovate. How can a company close the skills gap while innovating and creating product advantage?
This talk highlights how the right technology can allow you to compete without having an army of PhDs at your disposal. At iPass, we’ve created an environment where our engineers can be empowered to create value without getting bogged down by big data and Ops challenges. As a result, we have been able to more easily recruit internal engineers to our big data team, leveraging their current expertise, while bringing them up to speed on big data projects much faster. Join this talk to learn how you can do the same for your organization.
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...Shawn Jones
In a perfect world, all articles consistently contain sufficient metadata to describe the resource. We know this is not the reality, so we are motivated to investigate the evolution of the metadata that is present when authors and publishers supply their own. Because applying metadata takes time, we recognize that each news article author has a limited metadata budget with which to spend their time and effort. How are they spending this budget? What are the top metadata categories in use? How did they grow over time? What purpose do they serve? We also recognize that not all metadata fields are used equally. What is the growth of individual fields over time? Which fields experienced the fastest adoption? In this paper, we review 227,726 HTML news articles from 29 outlets captured by the Internet Archive between 1998 and 2016. Upon reviewing the metadata fields in each article, we discovered that 2010 began a metadata renaissance as publishers embraced metadata for improved search engine ranking, search engine tracking, social media tracking, and social media sharing. When analyzing individual fields, we find that one application of metadata stands out above all others: social cards -- the cards generated by platforms like Twitter when one shares a URL. Once a metadata standard was established for cards in 2010, its fields were adopted by 20% of articles in the first year and reached more than 95% adoption by 2016. This rate of adoption surpasses efforts like schema.org and Dublin Core by a fair margin. When confronted with these results on how news publishers spend their metadata budget, we must conclude that it is all about the cards.
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...Amazon Web Services
This document introduces Amazon QuickSight, a business analytics service from AWS. QuickSight allows users to easily connect to and analyze data from various AWS and third party sources. It provides fast, self-service analytics capabilities at 1/10th the cost of traditional BI solutions. QuickSight also enables collaboration, sharing of analyses and dashboards, and future integration with machine learning capabilities. The document demonstrates QuickSight through an example implementation at Hotelbeds Group to gain insights from their large and growing data sources on AWS.
SPS Brno 2017 - Go with the Microsoft flowAhmad Najjar
Microsoft Flow is a brand new SaaS offering, for automating workflows across the growing number of applications and SaaS services that business users rely on.
This session introduces Microsoft Flow and how it can help you save time, and make your business highly visible by partnering with Microsoft's growing ecosystem including O365 and SharePoint Online. It demonstrates how you can create an automated workflow between your favorite O365 apps and services to get notifications, synchronize files, collect data, and more!
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersLucidworks
This document discusses how SQL can be used in Lucidworks Fusion for various purposes like aggregating signals to compute relevance scores, ingesting and transforming data from various sources using Spark SQL, enabling self-service analytics through tools like Tableau and PowerBI, and running experiments to compare variants. It provides examples of using SQL for tasks like sessionization with window functions, joining multiple data sources, hiding complex logic in user-defined functions, and powering recommendations. The document recommends SQL in Fusion for tasks like analytics, data ingestion, machine learning, and experimentation.
講師: Ivan Cheng, Solution Architect, AWS
Join us for a series of introductory and technical sessions on AWS Big Data solutions. Gain a thorough understanding of what Amazon Web Services offers across the big data lifecycle and learn architectural best practices for applying those solutions to your projects.
We will kick off this technical seminar in the morning with an introduction to the AWS Big Data platform, including a discussion of popular use cases and reference architectures. In the afternoon, we will deep dive into Machine Learning and Streaming Analytics. We will then walk everyone through building your first Big Data application with AWS.
The structured streaming upgrade to Apache Spark and how enterprises can bene...Impetus Technologies
The adoption of Apache Spark to analyze data in real-time is increasing with its ability to handle sophisticated analytical requirements and a common framework for streaming and batch. However, most organizations are also looking for "true streaming" features like lower latency and the ability to process out-of-order data.
Structured Streaming, a new high-level API, introduced in Apache Spark 2.0 promises these and other enhancements to the Spark approach to streaming data processing.
In this webinar, Anand Venugopal (Product Head) and other technical experts from StreamAnalytix, speak about the promising developments in Apache Spark 2.0 and how organizations can leverage structured streaming to make timely and accurate decisions and stay competitive.
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Impetus Technologies
Apache Spark is one of the most popular Big Data frameworks today. It is fast becoming the de facto technology choice for stream processing, real-time analytics, data science and machine learning applications at scale. It has moved well beyond the early-adopter phase, is supported by a vibrant open source community and is enjoying accelerated adoption in enterprises.
Join our guest speaker from Forrester Research, VP & Principal Analyst, Mike Gualtieri and StreamAnalytix, Product Head, Anand Venugopal for a discussion on the trends and directions defining the growing importance of Apache Spark for stream processing, machine learning and other advanced data analytics applications.
The SAS Search Journey: Using AI to Move from Google to Lucidworks - Alex Fl...Lucidworks
1) SAS migrated their enterprise search from Google and another solution to Lucidworks Fusion to have a single search platform.
2) They encountered issues with the out of box configuration and content that impacted search relevance and ranking.
3) Through multiple iterations of configuration changes, indexing adjustments, and using AI/SAS tools to evaluate search terms, they improved relevance and the types of results returned for key search terms.
4) Future plans for the search include immediate indexing of new content, auto-suggest, spellcheck, and integrating search data into analytics dashboards. Lucidworks was chosen for its data analytics abilities, easy administration, and connectors.
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...Codemotion
Modern race cars produce lot of data, and all this in real time. In this presentation I will show you how data could be generated and used by various applications in the car, on the track or team head quarter. The demonstration will show how to move data using messaging systems like Apache Kafka, process the data using Apache Spark and use various storage technics: Distributed File System, NoSQL Database. This presentation is a great opportunity to see how to build a " near real time big data application". The code from this talk will be made available as open source.
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...SnapLogic
At last week's Strata + Hadoop World in San Jose, CA SnapLogic Chief Scientist Greg Benson talked to big data experts, data scientists and other enterprise IT leaders about the data lake and how SnapLogic comes into play with Hadoop-scale data integration.
Check out this presentation to learn how SnapLogic helps customers adopt Hadoop and automate data integration workflows.
To learn more, visit: www.snaplogic.com/big-data
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph DatabasesNeo4j
The document is an agenda for the Stockholm GraphDay event. It includes sessions on graph databases in a digital economy, graph use cases, a fraud detection use case using Neo4j, and training sessions. It encourages attendees to connect on social media using the event hashtag and provides the WiFi password.
Spark Summit EU 2015: Matei Zaharia keynoteDatabricks
2015 was a year of continued growth for Spark, with numerous additions to the core project and very fast growth of use cases across the industry. In this talk, I’ll look back at how the Spark community is has grown and changed in 2015, based on a large Apache Spark user survey conducted by Databricks. We see some interesting trends in the diversity of runtime environments (which are increasingly not just Hadoop); the types of applications run on Spark; and the types of users, now that features like R support and DataFrames are available in Spark. I’ll also cover the ongoing work in the upcoming releases of Spark to support new use cases.
This webinar focuses on the particular use case of graph databases in Network & IT-Management. This webinar is designed for people who work with Network Management at telecom companies or professionals within industries that handle and rely on complex networks.
We’ll start with an overview of Neo4j and Graph-thinking within Networks, explaining how Neworks are naturally modelled as graphs. We’ll explain how graph databases vastly help mitigate some of the major challenges the Network and Security Managers face on daily basis — including intrusions and other cyber crimes, performance optimization, outage simulations, fraud prevention and more.
Janus graph lookingbackwardreachingforwardDemai Ni
JanusGraph: Looking Backward and Reaching Forward - by Jason Plurad (@pluradj):
The JanusGraph project started at the Linux Foundation earlier this year, but it is not the new kid on the block. We'll start with a look at the origins and evolution of this open source graph database through the lens of a few IBM graph use cases. We'll discuss the new features in latest release of JanusGraph, and then take a look at future directions to explore together with the open community.
This document summarizes a live webinar about creating and querying a graph database of Olympic data. It describes loading data on athletes, countries, sports, events and medals from 1896-2012 into a Neo4j graph database. It then demonstrates several example queries of the Olympic graph, such as the number of sports per games, medals per country per sport, and athletes who medaled in multiple sports.
Detecting Mobile Malware with Apache Spark with David PryceDatabricks
“The ability to detect malware has needed to drastically change in the past few years away from traditional signature or list based techniques. Couple this with the rise of mobile device based attacks, where the scale of the data is predicted to be 60% of the internet in 2018*, our online lives will need Machine Learning (ML) and Data Science to ensure its security. At Wandera we have successfully implemented a malware detection (and classification) ML model at scale with the use of Apache Spark (MLib) and the PMML via OpenScoring paradigm.
In this talk we will touch on the training data and why we use Spark at all, the features we extract from mobile phone applications and how we then obtain our high accuracy scores in the cloud. At Wandera we have successfully implemented a Malware detection (and classification) ML model at scale with the use of Apache Spark (MLib) and the PMML via OpenScoring paradigm. *https://blog.cloudflare.com/our-predictions-for-2018/”
Accelerating Innovation with Unified Analytics with Ali GhodsiDatabricks
Today at the 10th Spark Summit, Databricks CEO & Co-founder revealed Databricks Serverless, a new initiative to offer serverless computing for complex data science and Apache Spark workloads. Databricks Serverless is the first product to offer a serverless API for Apache Spark, greatly simplifying and unifying data science and big data workloads for both end-users and DevOps.
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Spark Summit
In the race to invent multi-million dollar business opportunities with exclusive insights, data scientists and engineers are hampered by a multitude of challenges just to make one use case a reality – the need to ingest data from multiple sources, apply real-time analytics, build machine learning algorithms, and intermix different data processing models, all while navigating around their legacy data infrastructure that is just not up to the task. This need has created the demand for Virtual Analytics, where the complexities of disparate data and technology silos have been abstracted away, coupled with a powerful range of analytics and processing horsepower, all in one unified data platform. This talk describes how Databricks is powering this revolutionary new trend with Apache Spark.
DATA @ NFLX (Tableau Conference 2014 Presentation)Blake Irvine
I presented this at a 2014 Tableau Conference session with Albert Wong.
Netflix relies on data to make decisions ranging from buying and recommending content, to improving the streaming experience on devices.
This presentation shares our Big Data analytics architecture and the tools used to make data accessible throughout our business, focusing on how Tableau fits into our organization and why it aligns well with our culture.
Plongez au cœur de la recherche dans tous ses états.Elasticsearch
À l'instar de la plupart des entreprises modernes, vos équipes utilisent probablement plus de 10 applications hébergées dans le cloud chaque jour, mais passent aussi bien trop de temps à chercher les informations dont elles ont besoin dans ces outils. Grâce aux fonctionnalités prêtes à l'emploi d'Elastic Workplace Search, découvrez combien il est facile de mettre le contenu pertinent à portée de la main de vos équipes grâce à une recherche unifiée sur l'ensemble des applications qu'elles utilisent pour faire leur travail.
How Apache Spark Changed the Way We Hire People with Tomasz MagdanskiDatabricks
As big data technology matures, you’d think there would be more talent available to hire. Although the number of people interested and engaged in the big data world has dramatically increased, job demand is far ahead.
Companies like Google or Facebook have access to the best talent — thousands of engineers with PhDs from the best schools, which is why they are able to innovate. How can a company close the skills gap while innovating and creating product advantage?
This talk highlights how the right technology can allow you to compete without having an army of PhDs at your disposal. At iPass, we’ve created an environment where our engineers can be empowered to create value without getting bogged down by big data and Ops challenges. As a result, we have been able to more easily recruit internal engineers to our big data team, leveraging their current expertise, while bringing them up to speed on big data projects much faster. Join this talk to learn how you can do the same for your organization.
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...Shawn Jones
In a perfect world, all articles consistently contain sufficient metadata to describe the resource. We know this is not the reality, so we are motivated to investigate the evolution of the metadata that is present when authors and publishers supply their own. Because applying metadata takes time, we recognize that each news article author has a limited metadata budget with which to spend their time and effort. How are they spending this budget? What are the top metadata categories in use? How did they grow over time? What purpose do they serve? We also recognize that not all metadata fields are used equally. What is the growth of individual fields over time? Which fields experienced the fastest adoption? In this paper, we review 227,726 HTML news articles from 29 outlets captured by the Internet Archive between 1998 and 2016. Upon reviewing the metadata fields in each article, we discovered that 2010 began a metadata renaissance as publishers embraced metadata for improved search engine ranking, search engine tracking, social media tracking, and social media sharing. When analyzing individual fields, we find that one application of metadata stands out above all others: social cards -- the cards generated by platforms like Twitter when one shares a URL. Once a metadata standard was established for cards in 2010, its fields were adopted by 20% of articles in the first year and reached more than 95% adoption by 2016. This rate of adoption surpasses efforts like schema.org and Dublin Core by a fair margin. When confronted with these results on how news publishers spend their metadata budget, we must conclude that it is all about the cards.
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...Amazon Web Services
This document introduces Amazon QuickSight, a business analytics service from AWS. QuickSight allows users to easily connect to and analyze data from various AWS and third party sources. It provides fast, self-service analytics capabilities at 1/10th the cost of traditional BI solutions. QuickSight also enables collaboration, sharing of analyses and dashboards, and future integration with machine learning capabilities. The document demonstrates QuickSight through an example implementation at Hotelbeds Group to gain insights from their large and growing data sources on AWS.
SPS Brno 2017 - Go with the Microsoft flowAhmad Najjar
Microsoft Flow is a brand new SaaS offering, for automating workflows across the growing number of applications and SaaS services that business users rely on.
This session introduces Microsoft Flow and how it can help you save time, and make your business highly visible by partnering with Microsoft's growing ecosystem including O365 and SharePoint Online. It demonstrates how you can create an automated workflow between your favorite O365 apps and services to get notifications, synchronize files, collect data, and more!
Join Cloudian, Hortonworks and 451 Research for a panel-style Q&A discussion about the latest trends and technology innovations in Big Data and Analytics. Matt Aslett, Data Platforms and Analytics Research Director at 451 Research, John Kreisa, Vice President of Strategic Marketing at Hortonworks, and Paul Turner, Chief Marketing Officer at Cloudian, will answer your toughest questions about data storage, data analytics, log data, sensor data and the Internet of Things. Bring your questions or just come and listen!
This document summarizes Jeff Fried's presentation on search-driven intranets using SharePoint and Office 365. The presentation covered major trends in intranets, options for implementing intranets, and how search can provide a unified view of content and bridge across information silos. It demonstrated dynamic search pages, presentation of dynamic content, and finding people as content. The presentation concluded by discussing how nearly every intranet will need connectivity, structure, and context to supplement out-of-the-box SharePoint and Office 365 capabilities.
This document provides an overview and strategy for big and fast data initiatives in 2017. It discusses the data landscape including volume, velocity, variety and validity. It evaluates different data platform technologies and outlines requirements. The vision is described as "Business Insights at the Speed of Light". The strategy focuses on speed and leveraging key technologies like Spark. A roadmap with initiatives around insights, infrastructure, ingestion and big BI is presented. High level architectures for streaming and data flow are shown. Finally, data preparation vendors are compared.
Big Data in Action – Real-World Solution ShowcaseInside Analysis
The Briefing Room with Radiant Advisors and IBM
Live Webcast on February 25, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=53c9b7fa2000f98f5b236747e3602511
The power of Big Data depends heavily upon the context in which it's used, and most organizations are just beginning to figure out where, how and when to leverage it. One key to success is integration with existing information systems, many of which still rely on relational database technologies. Finding ways to blend these two worlds can help companies generate measurable business value in fairly short order.
Register for this episode of The Briefing Room to hear Analysts Lindy Ryan and John O'Brien as they explain how the combination of traditional Business Intelligence with Big Data Analytics can provide game-changing results in today's information economy. They'll be briefed by Eric Poulin and Paul Flach of Stream Integration who will share best practices for designing and implementing Big Data solutions. They'll discuss the components of IBM BigInsights, and explain how BigSheets can empower non-technical users who need to explore self-structured data.
Visit InsideAnlaysis.com for more information.
Amazon QuickSight is a fast BI service that makes it easy for you to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data. QuickSight is built to harness the power and scalability of the cloud, so you can easily run analysis on large datasets, and support hundreds of thousands of users. In this session, we’ll demonstrate how you can easily get started with Amazon QuickSight, uploading files, connecting to S3 and Redshift, and creating analyses from visualizations that are optimized based on the underlying data. Once we’ve built our analysis and dashboard, we’ll show you easy it is to share it with colleagues and stakeholders in just a few seconds.
The document discusses modernizing content management with Microsoft Content Services. It describes how traditional enterprise content management (ECM) systems focused on archiving and storage, whereas content services support broader business functions like collaboration. Content services provide a more dynamic lifecycle for content creation, coordination, protection and reuse. Microsoft and its partner HELUX provide tools like Microsoft Search, SharePoint, OneDrive and Azure to help organizations manage increasing volumes of content and meet compliance requirements.
Ms net work-sharepoint 2013-applied architecture from the field v4Tihomir Ignatov
The document provides an agenda for a presentation on SharePoint 2013 architecture lessons learned from real-world implementations. It covers topics such as software, deployment, and network architecture for SharePoint; architecture principles; governance plans; different app scenarios in SharePoint 2013; infrastructure considerations and topologies; and lessons from an Oracle to Microsoft migration project in the transport and logistics industry. Case studies discussed include implementations in government agencies and the public sector.
Overview of Krish's software development and IT consulting services for enterprise application modernization, SharePoint/Office 365, and cloud migrations.
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
Watch full webinar here: https://bit.ly/39AhUB7
Enterprise organizations are shifting to self-service analytics as business users need real-time access to holistic and consistent views of data regardless of its location, source or type for arriving at critical decisions.
Data Virtualization and Data Visualization work together through a universal semantic layer. Learn how they enable self-service data discovery and improve performance of your reports and dashboards.
In this session, you will learn:
- Challenges faced by business users
- How data virtualization enables self-service analytics
- Use case and lessons from customer success
- Overview of the highlight features in Tableau
SharePoint 2016 Why Upgrade: Top 10 Compelling FeaturesJoel Oleson
Are you ready for SharePoint 2016? Are you sure? Have you built your business case? In this session we dig into the new features with a focus on building the real reason to upgrade...
SharePoint 2016 has a lot of new features that we inherit from the cloud, as well as a lot of new hybrid features and additional UI investments that have already proven popular in Office 365 SharePoint online. With the IT Preview it seems Microsoft is only talking to the IT and Devs.
In this session we’ll approach the features from a user perspective and help you to:
Get a first look at SharePoint 2016 for the business whether you are already planning to upgrade, or just curious
Build the business case of a more secure, auditable, and flexible SharePoint 2016 upgrade or deployment
Start planning for the next big version of SharePoint and be ready for release
Integrating Applications and Data (with Oracle PaaS Cloud) - Oracle Cloud Day...Lucas Jellema
Integration is a challenge that has become even more urgent with the move to the cloud that all organizations are making or are about to make. Whether SaaS applications have to be enabled (linked to other SaaS applications or to custom apps) or IoT is used to integrate the physical world into enterprise IT or whether microservices (on premises) have to collaborate with microservices (in the cloud) - integration is at the heart of enterprise IT. This presentation discusses the move to the cloud, a number of common integration use cases and the key components in Oracle PaaS Portfolio for tackling these challenges. The presentation was delivered at the Oracle Cloud Day 2017 in Nieuwegein, The Netherlands
Amazon QuickSight is a fast, cloud-powered business intelligence (BI) service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data. In this session, we demonstrate how you can point Amazon QuickSight to AWS data stores, flat files, or other third-party data sources and begin visualizing your data in minutes. We also introduce you to SPICE - a Super-fast, Parallel, In-memory, Calculation Engine in Amazon QuickSight, which performs advanced calculations and render visualizations rapidly without requiring any additional infrastructure, SQL programming, or dimensional modeling, so you can seamlessly scale to hundreds of thousands of users and petabytes of data. Lastly, you will see how Amazon QuickSight provides you with smart visualizations and graphs that are optimized for your different data types, to ensure the most suitable and appropriate visualization to conduct your analysis, and how to share these visualization stories using the built-in collaboration tools.
[Webinar Slides] Future-Proof Your SharePoint InvestmentAIIM International
The document provides an overview of a presentation on future-proofing a SharePoint investment. Key points from the presentation include: reasons why some organizations' SharePoint efforts fail, such as insufficient planning and not focusing on usability; areas that cause workflow initiatives and IT management headaches, like developers not embracing future compatibility; and new features and capabilities coming in SharePoint 2016, such as support for larger file sizes and improved compliance and analytics tools. The presentation aims to help organizations successfully implement and develop SharePoint solutions that will continue to meet their needs over time.
The State of the Data Warehouse in 2017 and BeyondSingleStore
The document provides an overview of the changing analytic environment and the evolution of the data warehouse. It discusses how new requirements like performance, usability, optimization, and ecosystem integration are driving the adoption of a real-time data warehouse approach. A real-time data warehouse is described as having low latency ingestion, in-memory and disk-optimized storage, and the ability to power both operational and machine learning applications. Examples are given of companies using a real-time data warehouse to enable real-time analytics and improve business processes.
By Leveraging AWS Cloud and its services it not only help in reducing the cost but also brings agility and innovation. One of such service BigData provides a paradigm shift by putting smart in everything we do today including smart home, smart city, smart health, smart campus and many more. We will talk about how AWS services can help in reducing the cost and bring agility by leveraging Big Data to bring in innovation to campus.
Breaking the Monolith: Organizing Your Team to Embrace MicroservicesPaul Osman
Microservices are becoming an increasingly popular way to build software systems. Thanks to evangelism from companies like Netflix, Amazon, Gilt, ThoughtWorks and SoundCloud, more organizations are considering whether or not they should adopt this practice.
In this talk, I’ll discuss our experiences evolving 500px from a single, monolithic Ruby on Rails application to a series of composable microservices written in Ruby and Go. I’ll talk about the challenges we faced from a business, engineering, QA and operations perspective and how moving to microservices encouraged (or required) change in our organizational structure and culture.
In this talk, you’ll learn how a change in how we develop software affected team structures, development environments, testing infrastructure and encouraged us to explore moving to cloud hosting and to move closer to continuous delivery. You’ll also learn about the pitfalls, both expected and unexpected that we experienced along the way.
By sharing some of our experiences, I hope to provide some guidance to engineering teams considering whether or not to adopt microservices.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
11. Infrastructure v1.0 - Problem
11
$ psql mole-redshift
> connection limit 500 exceeded for non-
bootstrap users
$ fO_o
Our team used to called this the “rainbow rangers”
15. Parquet Transformation
15
Btw, we are deploying
Lambda using Serverless.
Apache Parquet & Apache Arrow
1. Parquet: Columnar data on disk
2. Arrow: Columnar data on memory
Reference: https://arrow.apache.org/docs/python/parquet.html
16. Parquet Transformation
16
Write and
upload to S3
Read and use
Parquet with
pandas
Lambda Function Handler:
- Enrich, cleanse, or transform data with Pandas
- Write data back to S3
Reference: https://arrow.apache.org/docs/python/parquet.html
24. Goal: to enrich user experience with article recommendations
Pipelines
24
ETL: Video Data ETL: Article Data
ETL: User Engagement ETL: Data imports
M.L.: Article Topic Modeling
M.L.: User Reading Habits à Collaborative Filtering