If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2014 tools including SSMS, SSIS, and SSDT.
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALMark Tabladillo
If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2014 tools including SSMS, SSIS, and SSDT.
This is a presentation by Peter Coppola, VP of Product and Marketing at Basho Technologies and Matthew Aslett, Research Director at 451 Research. Join them as they discuss whether multi-model databases and polyglot persistence have increased operational complexity. They'll discuss the benefits and importance of NoSQL databases and how the Basho Data Platform helps enterprises leverage Big Data applications.
My other computer is a datacentre - 2012 editionSteve Loughran
An updated version of the "my other computer is a datacentre" talk, presented at the Bristol University HPC talk.
Because it is targeted at universities, it emphasises some of the interesting problems -the classic CS ones of scheduling, new ones of availability and failure handling within what is now a single computer, and emergent problems of power and heterogeneity. It also includes references, all of which are worth reading, and, being mostly Google and Microsoft papers, are free to download without needing ACM or IEEE library access.
Comments welcome.
Databricks CEO Ali Ghodsi introduces Databricks Delta, a new data management system that combines the scale and cost-efficiency of a data lake, the performance and reliability of a data warehouse, and the low latency of streaming.
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Imply
Target is one of the largest retailers in the United States, with brick-and-mortar stores in all 50 states and one of the most-visited ecommerce sites in the country. In addition to typical merchandising functions like assortment planning, pricing and inventory management, Target also operates a large supply chain, financial/banking operations and property management organizations. As a data-driven organization, we need a data analytics platform that can address the unique needs of each of these various business units, while scaling to hundreds of thousands of users and accommodating an ever-increasing amount of data.
In this talk we’ll cover why Target chose to create our own analytics platform and specifically how Druid makes this platform successful. We’ll cover how we utilize key features in Druid, such as union datasources, arbitrary granularities, real-time ingestion, complex aggregation expressions and lightning-fast query response to provide analytics to users at all levels of the organization. We’ll also cover how Druid’s speed and flexibility allow us to provide interactive analytics to front-line, edge-of-business consumers to address hundreds of unique use-cases across several business units.
Optimizing the Data Supply Chain for Data ScienceVital.AI
As we move from the Data Warehouse to the Data Supply Chain, we open our perspective to include the full life cycle of data, from raw material to data product.
To produce data products with the most value, in an efficient and cost effective manner, quality control processes must be put into place at each link in the chain, driven by the requirements of data scientists. With such quality control processes in place, the burden of data scientists to cleanse data – typically 80% of the data scientists’ efforts – can be greatly reduced.
Data Models – including schema, metadata, rules, and provenance – play a crucial role in ensuring an effective Data Supply Chain.
Each Data Supply Chain link must be defined with firm boundaries with clear lines of team responsibility – with Data Models providing the natural borders.
In this talk we will discuss the processes that must be put into place at each link in the Data Supply Chain including perspectives on:
* The definition of Data Supply Chain vs. Data Warehouse
* Tools to create, manage, utilize, and share Data Models
* Tracking Data Provenance
* ETL processes, driven by Data Models
* Collaborative processes across Data Science teams
* Visualization of Data and Data Flow across the Data Supply Chain
* Apache Hadoop and Apache Spark as enabling technologies
* Data Science
* Cross-Organizational Collaboration
* Security
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALMark Tabladillo
If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2014 tools including SSMS, SSIS, and SSDT.
This is a presentation by Peter Coppola, VP of Product and Marketing at Basho Technologies and Matthew Aslett, Research Director at 451 Research. Join them as they discuss whether multi-model databases and polyglot persistence have increased operational complexity. They'll discuss the benefits and importance of NoSQL databases and how the Basho Data Platform helps enterprises leverage Big Data applications.
My other computer is a datacentre - 2012 editionSteve Loughran
An updated version of the "my other computer is a datacentre" talk, presented at the Bristol University HPC talk.
Because it is targeted at universities, it emphasises some of the interesting problems -the classic CS ones of scheduling, new ones of availability and failure handling within what is now a single computer, and emergent problems of power and heterogeneity. It also includes references, all of which are worth reading, and, being mostly Google and Microsoft papers, are free to download without needing ACM or IEEE library access.
Comments welcome.
Databricks CEO Ali Ghodsi introduces Databricks Delta, a new data management system that combines the scale and cost-efficiency of a data lake, the performance and reliability of a data warehouse, and the low latency of streaming.
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Imply
Target is one of the largest retailers in the United States, with brick-and-mortar stores in all 50 states and one of the most-visited ecommerce sites in the country. In addition to typical merchandising functions like assortment planning, pricing and inventory management, Target also operates a large supply chain, financial/banking operations and property management organizations. As a data-driven organization, we need a data analytics platform that can address the unique needs of each of these various business units, while scaling to hundreds of thousands of users and accommodating an ever-increasing amount of data.
In this talk we’ll cover why Target chose to create our own analytics platform and specifically how Druid makes this platform successful. We’ll cover how we utilize key features in Druid, such as union datasources, arbitrary granularities, real-time ingestion, complex aggregation expressions and lightning-fast query response to provide analytics to users at all levels of the organization. We’ll also cover how Druid’s speed and flexibility allow us to provide interactive analytics to front-line, edge-of-business consumers to address hundreds of unique use-cases across several business units.
Optimizing the Data Supply Chain for Data ScienceVital.AI
As we move from the Data Warehouse to the Data Supply Chain, we open our perspective to include the full life cycle of data, from raw material to data product.
To produce data products with the most value, in an efficient and cost effective manner, quality control processes must be put into place at each link in the chain, driven by the requirements of data scientists. With such quality control processes in place, the burden of data scientists to cleanse data – typically 80% of the data scientists’ efforts – can be greatly reduced.
Data Models – including schema, metadata, rules, and provenance – play a crucial role in ensuring an effective Data Supply Chain.
Each Data Supply Chain link must be defined with firm boundaries with clear lines of team responsibility – with Data Models providing the natural borders.
In this talk we will discuss the processes that must be put into place at each link in the Data Supply Chain including perspectives on:
* The definition of Data Supply Chain vs. Data Warehouse
* Tools to create, manage, utilize, and share Data Models
* Tracking Data Provenance
* ETL processes, driven by Data Models
* Collaborative processes across Data Science teams
* Visualization of Data and Data Flow across the Data Supply Chain
* Apache Hadoop and Apache Spark as enabling technologies
* Data Science
* Cross-Organizational Collaboration
* Security
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data PlatformSavita Yadav
KMIS International Conference 2021.
This talk aims to provide insights and performance of predictive models for Airbnb Rating using Big Data and distributed parallel computing systems. We have predicted and classified using Two-Class Classification models if a property has a high or a low rating based on the features of the listing. It helps the hosts to know if their property is suitable and how their listing compares to other similar listings. We compare the results and the performance of rating prediction models with accuracy and computing time metrics.
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0Denodo
In this presentation, you will see the new functionalities of the Denodo 6.0 detailing dynamic query optimization engine, managing enterprise deployments, and using information self-service for discovery and search.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/DzRtkg.
Creating a Modern Data Architecture for Digital TransformationMongoDB
By managing Data in Motion, Data at Rest, and Data in Use differently, modern Information Management Solutions are enabling a whole range of architecture and design patterns that allow enterprises to fully harness the value in data flowing through their systems. In this session we explored some of the patterns (e.g. operational data lakes, CQRS, microservices and containerisation) that enable CIOs, CDOs and senior architects to tame the data challenge, and start to use data as a cross-enterprise asset.
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
With so much talk of how Big Data is revolutionizing the world and how a data lake with Hadoop and/or Spark will solve all your data problems, it is hard to tell what is hype, reality, or somewhere in-between.
In working with dozens of enterprises in varying stages of their enterprise data management (EDM) strategy, MongoDB enterprise architect, Matt Kalan, sees the same challenges and misunderstandings arise again and again.
In this session, he will explain common challenges in data management, what capabilities are necessary, and what the future state of architecture looks like. MongoDB is uniquely capable of filling common gaps in the data lake strategy.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
As Twitch grew, both the amount of data we received and the number of employees interested in the data grew rapidly. In order to continue empowering decision making as we scaled, we turned to using Druid and Imply to provide self service analytics to both our technical and non technical staff allowing them to drill into high level metrics in lieu of reading generated reports.
In this talk, learn how Twitch implemented a common analytics platform for the needs of many different teams supporting hundreds of users, thousands of queries, and ~5 billion events each day. This session will explain our Druid architecture in detail, including:
-The end-to-end architecture deployed on Amazon that includes Kinesis, RDS, S3, Druid, Pivot and Tableau
-How the data is brought together to deliver a unified view of live customer engagement and historical trends
-Operational best practices we learnt scaling Druid
-An example walk through using the platform
LendingClub RealTime BigData Platform with Oracle GoldenGateRajit Saha
LendingClub RealTime BigData Platform with Oracle GoldenGate BigData Adapter. This was presented at Oracle Open World 2017 at San Francisco.
Speaker :
Rajit Saha
Vengata Guruswami
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLJongwook Woo
This talk aims at providing insights, performance, and architecture on Financial Fraud Detection on a mobile money transactional activity in Azure ML and Spark. We have predicted and classified the transaction as normal or fraud with a small sample and massive data set using Azure ML and Spark ML, which are traditional systems and Big Data respectively. I will present predictive analysis with several classification models experimenting in Azure and Spark ML. Besides, scalability of Spark ML will be presented for the models with different number of nodes for Spark clusters in Amazon AWS.
When it comes to dealing with large, complex, and disparate data sets, traditional database technologies are unable to keep pace with the rich analytics necessary to power today’s data-driven applications. Graph analytics databases are becoming the underlying infrastructure for AI and machine learning. These databases allow users to ask complex questions across complex data, which is not always practical or even possible at scale using other approaches. They also enable faster insights against massive data sets when combined with pattern recognition, statistical analysis, and AI/ machine learning. And in the case of standards-based graph databases, they connect with popular visualization tools like Graphileon, allowing users to easily explore their data stores and quickly build compelling graph-based applications.
Entity Resolution Service - Bringing Petabytes of Data Online for Instant AccessDataWorks Summit
2.5B+ ids, 2ms latency, 15K+ TPS and Petabytes of data.These numbers outline the challenges with eBay’s Entity Resolution Service (ERS). ERS provides a temporal map between anyid-anyid. The technology stack of ERS has Hadoop as the batch layer, Couchbase as cache layer, Spring Batch to load data to Couchbase and Rest API at Service layer. In our presentation we will take you through the journey from conceptual to production release. It’s a great story and we would like to share with you!
Visualize the Knowledge Graph and Unleash Your DataLinkurious
Slides from the webinar "Visualize the Knowledge Graph and Unleash Your Data" with Michael Grove, Vice President of Engineering and co-founder of Stardog, and Jean Villedieu, co-founder of Linkurious.
The webinar covers the topic of enterprise Knowledge Graphs and lets you experience how to visualize and analyze this data to discover actionable insights for your organization.
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2Mark Tabladillo
This presentation introduces SQL Server Data Mining (SSDM) for SQL Server Professionals based on the speaker's past presentation for Microsoft TechEd. Starting with SQL Server Management Studio (SSMS), the demo includes the interfaces important for professional development, including Business Intelligence Development Studio (BIDS), highlighting Integration Services, and PowerShell. The interactive demos are based on Microsoft's Contoso Retail sample data. Finally we will evaluate where Microsoft data mining can help you in a practical business environment, which may include Oracle and SAS.
This presentation includes a complex business analytics solution using elements across the Microsoft Business Intelligence technology. This talk will not have all the steps spelled out. Therefore you should prepare -- for example, by looking at the SQL Server 2012 Developer Kit -- since attendees will be expected to have working knowledge of SQL Server relational database, SSIS, SSAS, SSRS, SSMS and SSDT. Key elements of this solution will highlight features new to SQL Server 2012. At the end of the demonstration, you will be expected to participate in groups to promote one idea which builds on the demo, and everyone will be voting for their favorite groups. Come ready to create and celebrate your peers.
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data PlatformSavita Yadav
KMIS International Conference 2021.
This talk aims to provide insights and performance of predictive models for Airbnb Rating using Big Data and distributed parallel computing systems. We have predicted and classified using Two-Class Classification models if a property has a high or a low rating based on the features of the listing. It helps the hosts to know if their property is suitable and how their listing compares to other similar listings. We compare the results and the performance of rating prediction models with accuracy and computing time metrics.
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0Denodo
In this presentation, you will see the new functionalities of the Denodo 6.0 detailing dynamic query optimization engine, managing enterprise deployments, and using information self-service for discovery and search.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/DzRtkg.
Creating a Modern Data Architecture for Digital TransformationMongoDB
By managing Data in Motion, Data at Rest, and Data in Use differently, modern Information Management Solutions are enabling a whole range of architecture and design patterns that allow enterprises to fully harness the value in data flowing through their systems. In this session we explored some of the patterns (e.g. operational data lakes, CQRS, microservices and containerisation) that enable CIOs, CDOs and senior architects to tame the data challenge, and start to use data as a cross-enterprise asset.
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
With so much talk of how Big Data is revolutionizing the world and how a data lake with Hadoop and/or Spark will solve all your data problems, it is hard to tell what is hype, reality, or somewhere in-between.
In working with dozens of enterprises in varying stages of their enterprise data management (EDM) strategy, MongoDB enterprise architect, Matt Kalan, sees the same challenges and misunderstandings arise again and again.
In this session, he will explain common challenges in data management, what capabilities are necessary, and what the future state of architecture looks like. MongoDB is uniquely capable of filling common gaps in the data lake strategy.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
As Twitch grew, both the amount of data we received and the number of employees interested in the data grew rapidly. In order to continue empowering decision making as we scaled, we turned to using Druid and Imply to provide self service analytics to both our technical and non technical staff allowing them to drill into high level metrics in lieu of reading generated reports.
In this talk, learn how Twitch implemented a common analytics platform for the needs of many different teams supporting hundreds of users, thousands of queries, and ~5 billion events each day. This session will explain our Druid architecture in detail, including:
-The end-to-end architecture deployed on Amazon that includes Kinesis, RDS, S3, Druid, Pivot and Tableau
-How the data is brought together to deliver a unified view of live customer engagement and historical trends
-Operational best practices we learnt scaling Druid
-An example walk through using the platform
LendingClub RealTime BigData Platform with Oracle GoldenGateRajit Saha
LendingClub RealTime BigData Platform with Oracle GoldenGate BigData Adapter. This was presented at Oracle Open World 2017 at San Francisco.
Speaker :
Rajit Saha
Vengata Guruswami
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLJongwook Woo
This talk aims at providing insights, performance, and architecture on Financial Fraud Detection on a mobile money transactional activity in Azure ML and Spark. We have predicted and classified the transaction as normal or fraud with a small sample and massive data set using Azure ML and Spark ML, which are traditional systems and Big Data respectively. I will present predictive analysis with several classification models experimenting in Azure and Spark ML. Besides, scalability of Spark ML will be presented for the models with different number of nodes for Spark clusters in Amazon AWS.
When it comes to dealing with large, complex, and disparate data sets, traditional database technologies are unable to keep pace with the rich analytics necessary to power today’s data-driven applications. Graph analytics databases are becoming the underlying infrastructure for AI and machine learning. These databases allow users to ask complex questions across complex data, which is not always practical or even possible at scale using other approaches. They also enable faster insights against massive data sets when combined with pattern recognition, statistical analysis, and AI/ machine learning. And in the case of standards-based graph databases, they connect with popular visualization tools like Graphileon, allowing users to easily explore their data stores and quickly build compelling graph-based applications.
Entity Resolution Service - Bringing Petabytes of Data Online for Instant AccessDataWorks Summit
2.5B+ ids, 2ms latency, 15K+ TPS and Petabytes of data.These numbers outline the challenges with eBay’s Entity Resolution Service (ERS). ERS provides a temporal map between anyid-anyid. The technology stack of ERS has Hadoop as the batch layer, Couchbase as cache layer, Spring Batch to load data to Couchbase and Rest API at Service layer. In our presentation we will take you through the journey from conceptual to production release. It’s a great story and we would like to share with you!
Visualize the Knowledge Graph and Unleash Your DataLinkurious
Slides from the webinar "Visualize the Knowledge Graph and Unleash Your Data" with Michael Grove, Vice President of Engineering and co-founder of Stardog, and Jean Villedieu, co-founder of Linkurious.
The webinar covers the topic of enterprise Knowledge Graphs and lets you experience how to visualize and analyze this data to discover actionable insights for your organization.
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2Mark Tabladillo
This presentation introduces SQL Server Data Mining (SSDM) for SQL Server Professionals based on the speaker's past presentation for Microsoft TechEd. Starting with SQL Server Management Studio (SSMS), the demo includes the interfaces important for professional development, including Business Intelligence Development Studio (BIDS), highlighting Integration Services, and PowerShell. The interactive demos are based on Microsoft's Contoso Retail sample data. Finally we will evaluate where Microsoft data mining can help you in a practical business environment, which may include Oracle and SAS.
This presentation includes a complex business analytics solution using elements across the Microsoft Business Intelligence technology. This talk will not have all the steps spelled out. Therefore you should prepare -- for example, by looking at the SQL Server 2012 Developer Kit -- since attendees will be expected to have working knowledge of SQL Server relational database, SSIS, SSAS, SSRS, SSMS and SSDT. Key elements of this solution will highlight features new to SQL Server 2012. At the end of the demonstration, you will be expected to participate in groups to promote one idea which builds on the demo, and everyone will be voting for their favorite groups. Come ready to create and celebrate your peers.
Are you looking to create more stability in your professional future in uncertain economic times? Developing a social media platform is a challenge for highend professionals and consultants. The presenter has established a blog (http://marktab.net), cofounded an online journal, become a paid video presenter, secured a spot at Microsoft TechEd, and earned credit toward his first Microsoft MVP. This presentation introduces the basic elements for 2013 of a successful web strategy. The presenter shares his own statistics and first-hand experiences with website development, WordPress blog hosting, leveraging social media services (including Twitter, Linked in, YouTube and Facebook), and working with Microsoft.
SQL Saturday 108 -- Enterprise Data Mining with SQL ServerMark Tabladillo
Presented at SQL Saturday 108 Redmond, WA -- This presentation introduces SQL Server Data Mining (SSDM) for SQL Server Professionals based on the speaker's past presentation for Microsoft TechEd. Starting with SQL Server Management Studio (SSMS), the demo includes the interfaces important for professional development, including Business Intelligence Development Studio (BIDS), highlighting Integration Services, and PowerShell. The interactive demos are based on Microsoft's Contoso Retail sample data. Finally we will evaluate where Microsoft data mining can help you in a practical business environment, which may include Oracle and SAS.
If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2012 tools including SSMS, SSIS, and SSDT.
Presented at SQL Saturday 220, Atlanta, GA, 201305. If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2012 tools including SSMS, SSIS, and SSDT.
Applied Enterprise Semantic Mining -- Charlotte 201410Mark Tabladillo
Text mining is projected to dominate data mining, and the reasons are evident: we have more text available than numeric data. Microsoft introduced a new technology to SQL Server 2014 called Semantic Search. This session's detailed description and demos give you important information for the enterprise implementation of Tag Index and Document Similarity Index, and will also provide a comparison between what semantic search is and what Delve does. The demos include a web-based Silverlight application, and content documents from Wikipedia. We'll also look at strategy tips for how to best leverage the new semantic technology with existing Microsoft data mining.
Data-driven companies have a need to make their data easily accessible to those who analyze it. Many organizations have adopted the Looker application, LookML on AWS, a centralized analytical database with a user-friendly interface that allows employees to ask and answer their own questions to make informed business decisions.
Join our webinar to learn how our customer, Casper, an online mattress retailer, made the switch from a transactional database to Looker’s data analytics program on Amazon Redshift. Looker on Amazon Redshift can help you greatly reduce your analytics lifecycle with a simplified infrastructure and rapid cloud scaling.
Join us to learn:
• How to utilize LookML to build reusable definitions and logic for your data
• Best practices for architecting a centralized analytical database
• How Casper leveraged Looker and Amazon Redshift to provide all their employees access to their data and metrics
Who should attend: Heads of Analytics, Heads of BI, Analytics Managers, BI Teams, Senior Analysts
Presented at SQL Saturday Atlanta May 18, 2013
Text mining is projected to dominate data mining, and the reasons are evident: we have more text available than numeric data. Microsoft introduced a new technology to SQL Server 2012 called Semantic Search. This session's detailed description and demos give you important information for the enterprise implementation of Tag Index and Document Similarity Index. The demos include a web-based Silverlight application, and content documents from Wikipedia. We'll also look at strategy tips for how to best leverage the new semantic technology with existing Microsoft data mining.
Text mining is projected to dominate data mining, and the reasons are evident: we have more text available than numeric data. Microsoft introduced a new technology to SQL Server 2012 called Semantic Search. This session's detailed description and demos give you important information for the enterprise implementation of Tag Index and Document Similarity Index. The demos include a web-based Silverlight application, and content documents from Wikipedia. We'll also look at strategy tips for how to best leverage the new semantic technology with existing Microsoft data mining.
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
Watch full webinar here: https://bit.ly/3offv7G
Presented at AI Live APAC
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spend most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Watch this on-demand session to learn how companies can use data virtualization to:
- Create a logical architecture to make all enterprise data available for advanced analytics exercise
- Accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
- Integrate popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc.
Applied Semantic Search with Microsoft SQL ServerMark Tabladillo
Text mining is projected to dominate data mining, and the reasons are evident: we have more text available than numeric data. Microsoft introduced a new technology to SQL Server 2012 called Semantic Search. This session's detailed description and demos give you important information for the enterprise implementation of Tag Index and Document Similarity Index. The demos include a web-based Silverlight application, and content documents from Wikipedia. We'll also look at strategy tips for how to best leverage the new semantic technology with existing Microsoft data mining.
Data Warehouse Design and Best PracticesIvo Andreev
A data warehouse is a database designed for query and analysis rather than for transaction processing. An appropriate design leads to scalable, balanced and flexible architecture that is capable to meet both present and long-term future needs. This session covers a comparison of the main data warehouse architectures together with best practices for the logical and physical design that support staging, load and querying.
Microsoft Technologies for Data Science 201612Mark Tabladillo
Delivered to SQL Saturday BI Edition -- Atlanta, GA
Microsoft provides several technologies in and around Azure which can be used for casual to serious data science. This presentation provides an overview of the major Microsoft options for both on-premise and cloud-based data science (and hybrid). These technologies have been used by the presenter in various companies and industries, both as a Microsoft consultant and previously independent consultant. As well, the speaker provides insights into data science careers, information which helps imply where the business will likely be for consultants and partners.
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo
Watch the full session: Denodo DataFest 2016 sessions: https://goo.gl/yVJnti
Data virtualization starts with democratizing data access for business users, but goes well beyond to enable entire analytics life cycle. This session will discuss the critical role of data virtualization in the four key phases of big data analytics: Discovery of raw and enriched data, Analytic Exploration, Real-time Operationalization, and Predictive Intervention.
In this session, you will learn:
• Design of advanced analytics with view towards business goal realization
• The role of data virtualization in enabling analytics through four key phases
• How to exploit product capabilities relevant to each stage
• Creating a system of governed self-service and collaborative analytics
This session is part of the Denodo DataFest 2016 event. You can also watch more Denodo DataFest sessions on demand here: https://goo.gl/VXb6M6
Virtualisation de données : Enjeux, Usages & BénéficesDenodo
Watch full webinar here: https://bit.ly/3oah4ng
Gartner a récemment qualifié la Data Virtualisation comme étant une pièce maitresse des architectures d’intégration de données.
Découvrez :
- Les bénéfices d’une plateforme de virtualisation de données
- La multiplication des usages : Lakehouse, Data Science, Big Data, Data Service & IoT
- La création d’une vue unifiée de votre patrimoine de données sans transiger sur la performance
- La construction d’une architecture d’intégration Agile des données : on-premise, dans le cloud ou hybride
Similar to Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411 (20)
How to find low-cost or free data science resources 202006Mark Tabladillo
There are many free or low-cost resources to become better trained in data science. None of these options equals a formal degree: but short of that scope, these other resources are helpful at least for keeping up with technology. This presentation will provide specific recommendations on free or low-cost resources based on the Team Data Science Process framework (business understanding, data engineering, modeling, deployment).
This presentation covers some of the major data science and AI announcements from the May 2020 Microsoft Build conference. Included in this talk are 1) Azure Synapse Link, 2) Responsible AI, 3) Project Bonsai & Project Moab, and 4) AI Models at Scale (deep learning with billions of parameters).
Microsoft has released Automated ML technologies for developers through ML.NET, Azure ML Service, and Azure Databricks. This presenter is a data scientist and Microsoft architect, and will give a comprehensive overview of the utility and use case of this automated technology for production solutions. The presentation includes code you can try now.
Automated machine learning (automated ML) automates feature engineering, algorithm and hyperparameter selection to find the best model for your data. The mission: Enable automated building of machine learning with the goal of accelerating, democratizing and scaling AI. This presentation covers some recent announcements of technologies related to Automated ML, and especially for Azure. The demonstrations focus on Python with Azure ML Service and Azure Databricks.
ML.NET 1.0 release is the first major milestone of a great journey that started in May 2018 when we released ML.NET 0.1 as open source. ML.NET is an open-source and cross-platform machine learning framework for .NET developers. Using ML.NET, developers can leverage their existing tools and skillsets to develop and infuse custom AI into their applications by creating custom machine learning models for common scenarios like Sentiment Analysis, Recommendation, Image Classification and more.
This presentation provides an overview of the technology with demos run in a Deep Learning Virtual Machine running Windows Server 2016. Code examples are in C# and F# and run in Visual Studio Community 2019. This technology is ready for production implementation and runs on .NET Core.
This presentation is the first of four related to ML.NET and Automated ML. The presentation will be recorded with video posted to this YouTube Channel: http://bit.ly/2ZybKwI
Automated machine learning (automated ML) automates feature engineering, algorithm and hyperparameter selection to find the best model for your data. The mission: Enable automated building of machine learning with the goal of accelerating, democratizing and scaling AI.
This presentation covers some recent announcements of technologies related to Automated ML, and especially for Azure. The demonstrations focus on Python with Azure ML Service and Azure Databricks.
This presentation is the fourth of four related to ML.NET and Automated ML. The presentation will be recorded with video posted to this YouTube Channel: http://bit.ly/2ZybKwI
NimbusML enables data scientists to use ML.NET to train models in Azure Machine Learning or anywhere else they use Python. NimbusML provides state-of-the-art ML algorithms, transforms and components, aiming to make them useful for all developers, data scientists, and information workers and helpful in all products, services and devices. The components are authored by the team members, as well as numerous contributors from MSR, CISL, Bing and other teams at Microsoft. NimbusML is interoperable with scikit-learn estimators and transforms, while adding a suite of highly optimized algorithms written in C++ and C# for speed and performance.
The trained machine learning model can be used in a .NET application with ML.NET. This presentation will outline the features of NimbusML and provide a notebook-based demonstration using Azure Notebooks.
This presentation is the third of four related to ML.NET and Automated ML. The presentation will be recorded with video posted to this YouTube Channel: http://bit.ly/2ZybKwI
201906 02 Introduction to AutoML with ML.NET 1.0Mark Tabladillo
ML.NET 1.0 release is the first major milestone of a great journey that started in May 2018 when we released ML.NET 0.1 as open source. ML.NET is an open-source and cross-platform machine learning framework for .NET developers. Using ML.NET, developers can leverage their existing tools and skillsets to develop and infuse custom AI into their applications by creating custom machine learning models for common scenarios like Sentiment Analysis, Recommendation, Image Classification and more.
“Automated ML” is a collection of new technologies from Microsoft to enhance the data science development process. Still in preview, Auto ML for ML.NET 1.0 will be demonstrated in a Deep Learning Virtual Machine running Windows Server 2016. Code examples are in C# and run in Visual Studio Community 2019.
This presentation is the second of four related to ML.NET and Automated ML. The presentation will be recorded with video posted to this YouTube Channel: http://bit.ly/2ZybKwI
This presentation focuses on the value proposition for Azure Databricks for Data Science. First, the talk includes an overview of the merits of Azure Databricks and Spark. Second, the talk includes demos of data science on Azure Databricks. Finally, the presentation includes some ideas for data science production.
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...Mark Tabladillo
Microsoft has several Azure certifications including DP-100 (Designing and Implementing a Data Science Solution on Azure). Until this month, the exam had been in beta: however, the presenter has just passed the exam (first try). The purpose of this event is to share a viewpoint on how to study for the exam. Today, there are multiple ways to develop and deliver and deploy R or Python or Spark or deep learning models on Azure. The differences are important for this exam.
Big Data Advanced Analytics on Microsoft Azure 201904Mark Tabladillo
This talk summarizes key points for big data advanced analytics on Microsoft Azure. First, there is a review of the major technologies. Second, there is a series of technology demos (focusing on VMs, Databricks and Azure ML Service). Third, there is some advice on using the Team Data Science Process to help plan projects. The deck has web resources recommended. This presentation was delivered at the Global Azure Bootcamp 2019, Atlanta GA location (Alpharetta Avalon).
This presentation anchors best practices for Enterprise Data Science based on Microsoft's "Team Data Science Process". The talk includes introducing the concepts, describing some real-world advice for project planning, and discusses typical titles of professionals who make enterprise data science successful. These techniques also apply for AI (artificial intelligence), deep learning, machine learning, and advanced analytics.
Training of Python scikit-learn models on AzureMark Tabladillo
This intermediate-level presentation covers latest Azure technology for deploying Python sci-kit models on Azure. The presentation is a demo using a Microsoft Data Science Virtual Machine (DSVM), Visual Studio Code, Azure Machine Learning Service, Azure Machine Learning Compute, Azure Storage Blobs, and Azure Container Registry to train a model from a Python 3 Anaconda environment.
The presentation will include an architectural diagram and downloadable code from Github.
YouTube recording at https://www.youtube.com/watch?v=HyzbxHBpAbg&feature=youtu.be
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
This presentation provides a survey of the advanced analytics strengths of Microsoft Azure from an enterprise perspective (with these organizations being the bulk of big data users) based on the Team Data Science Process. The talk also covers the range of analytics and advanced analytics solutions available for developers using data science and artificial intelligence from Microsoft Azure.
Power BI has become an increasingly important data analytics tool. This presentation focuses on the advanced analytics options currently available in Power BI. Attendees to this talk will see:
· Microsoft’s perspective on advanced analytics development: the Team Data Science Process
· What the general options are for advanced analytics on Azure
· What the specific native advanced analytics capabilities are in Power BI
· Some ideas on pairing Power BI with other technologies in advanced analytics architectures
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Mark Tabladillo
The Microsoft Cognitive Toolkit (CNTK) is a unified deep-learning toolkit that describes neural networks as a series of computational steps via a directed graph. In this directed graph, leaf nodes represent input values or network parameters, while other nodes represent matrix operations upon their inputs.
The objectives of this presentation is to 1) describe what CNTK is, 2) present a comparative evaluation with similar technologies, 3) outline potential applications, and 4) demonstrate the technology with Jupyter Python examples.
Machine learning services with SQL Server 2017Mark Tabladillo
SQL Server 2017 introduces Machine Learning Services with two independent technologies: R and Python. The purpose of this presentation is 1) to describe major features of this technology for technology managers; 2) to outline use cases for architects; and 3) to provide demos for developers and data scientists.
How Big Companies plan to use Our Big Data 201610Mark Tabladillo
Underneath the shiny popular apps on tablets, smartphones, and entertainment channels are typically large cloud-based data centers. App developers leverage the cloud to provide advertisers with targeted sales opportunities, which has been accounting for an ongoing shift from paper to online media. This presentation will provide updated trends and statistics for 2016 on big data usage (based on consumer use), statistical concerns with big data, and the Microsoft big data story.
Delivered to SQL Saturday Columbus, GA
Microsoft provides several technologies which can be used for casual to serious data science. This presentation provides an authoritative overview of two major categories: products and services. The products include: SQL Server Analysis Services, Excel Add-in for SSAS, Semantic Search, SQL Server R Services, Microsoft R Technologies, and F#. The services include Cortana Intelligence and Bing Predicts. These technologies have been used by the presenter in various companies and industries, and he will be speaking toward how Microsoft uses these technologies today for its largest Azure customers.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
6. Definition
Data mining is the automated or semi-automated process of discovering patterns in data
Machine learning is the development and optimization of algorithms for automated or semi-automated pattern discovery
13. What
Why
How
Relational Data Warehouse
Familiarway to store, fast retrieval, consistency, scalable
Database, relational constructs,indexes
Hadoop & HDInsight
Large amounts, divideand conquer, analyzing unstructured data, flexible schema
Distributed computing
Tabular
Fast calculations
In-memory, columns over rows
MultidimensionalOLAP
Sliceand dice, ad hoc querying
Expandsstar schema into cube, preaggregatedcalculations
Data Mining & Machine Learning
Patterns, predictions, high volume
Algorithms,estimations
18. Excel Data Mining Add-In
For Office 2007: The 32-bit data mining add-in works with SQL Server 2008 or 2008 R2:
http://www.microsoft.com/en-us/download/details.aspx?id=7294
For Office 2010: The 32-or 64-bit data mining add-in works with SQL Server 2012 or earlier:
http://www.microsoft.com/en-us/download/details.aspx?id=35578
For Office 2013: The 32-or 64-bit data mining add-in works with SQL Server 2012 or earlier:
http://www.microsoft.com/en-us/download/details.aspx?id=35578
19. Secret: Data Science provides an Epistemology
Data mining is part of a complete data science cycle
24. Gartner 2013
Magic Quadrant for Business Intelligence and Analytics Platforms
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb–February 5, 2013
25. Gartner 2013
Magic Quadrant for Data Warehouse Database Management Systems
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb–January 31, 2013
26. KDNuggets2014What Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project?
http://www.kdnuggets.com/2014/06/analytics-data- mining-data-science-software-poll-analyzed.html
27. KDNuggets2014What Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project?
http://www.kdnuggets.com/2014/06/analytics-data- mining-data-science-software-poll-analyzed.html
34. Data platform: SQL Server 2014
Database Services
SQL Server* SQL Azure*
ReplicationSQL Azure Data Sync*
Full Text & Semantic Search*
Data Integration Services
Integration Services*
Master Data Services*
Data Quality Services*
StreamInsight* Project “Austin”*
Analytical Services
Analysis Services*
Data Mining
PowerPivot*
Reporting Services
Reporting Services* SQL Azure Reporting*
Report Builder
Power View*
35. Secret: Microsoft offers two choices
SQL Server Analysis Services = SQL Server Data Mining
Microsoft Azure Machine Learning
36. Advanced analytic tools for data scientists
•Advanced descriptive analytics (e.g. clustering algorithm in SQL Server Analysis Services)
•Predictive analytics (Neural Nets, Regression, Decision Tree, Time Series, Naïve Bayes algorithms in SQL Server Analysis Services)
•Further advanced analytics (Semantic Search and Geospatial Data and functions in SQL Server 2012)
•Big Data analytics(Hadoop integration)
38. SSAS Data Mining Capacities
SQL Server 2014Analysis Services Object
Maximum sizes/numbers
Maximum data mining models per structure
2^31-1 = 2,147,483,647
Maximum data mining structures per solution
2^31-1 = 2,147,483,647
Maximum data mining structures per Analysis Services database
2^31-1 = 2,147,483,647
Maximum data mining attributes (variables) per structure
64K
Reference: http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
41. Future: Most data is Text
•Quantitative research = data mining
•Qualitative research = text mining
Two Research Types
The future is combining both
42. (iFilterRequired)
Documents
Full-Text Keyword Index
“FTI”
iFilters
Semantic Document Similarity Index “DSI”
Semantic Database
Semantic Key Phrase Index –
Tag Index “TI”
43. Languages Currently Supported
Traditional Chinese
German
English
French
Italian
Brazilian
Russian
Swedish
Simplified Chinese
British English
Portuguese
Chinese (Hong Kong SAR, PRC)
Spanish
Chinese (Singapore)
Chinese (Macau SAR)
45. Integrated Full Text Search (iFTS)
Improved Performance and Scale:
Scale-up to 350M documents for storage and search
iFTSquery performance 7-10 times faster than in SQL Server 2008
Worst-case iFTSquery response times less than 3 sec for corpus
Similar or better than main database search competitors
(2012, Michael Rys, Microsoft)
46. Linear Scale of FTI/TI/DSI
First known linearly scaling end-to-end Search and Semantic product in the industry
Time in Seconds vs. Number of Documents
(2011 –K. Mukerjee, T. Porter, S. Gherman–Microsoft)
47. Text Mining References
Video
http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search
http://www.microsoftpdc.com/2009/SVR32
Semantic Search (Books Online) –explains the demo
http://msdn.microsoft.com/en-us/library/gg492075.aspx
Paper
http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
49. Major Websites
SQL Server Data Mining
http://technet.microsoft.com/en-us/sqlserver/cc510301.aspx
http://www.sqlserverdatamining.com/
Microsoft Azure Machine Learning (currently in preview) http://azure.microsoft.com/en-us/services/machine-learning/
50. Software
Dreamspark(students); BizSpark(businesses)
SQL Server 2014 Enterprise (includes database engine, Analysis Services, SSMS and SSDT)
http://www.microsoft.com/en-us/server-cloud/products/sql-server/default.aspx
Microsoft Office
http://office.microsoft.com/en-us/
Primer on Power BI --MarkTab
http://blogs.msdn.com/b/mvpawardprogram/archive/2014/08/04/primer-on-power-bi-business- intelligence.aspx
53. Conclusion
Excel data mining
Data Science provides an epistemology
Microsoft is an analytics competitor
Many already have Microsoft analytics
Microsoft offers two enterprise solutions
Semantic search scales linearly
54. Abstract
If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2014 tools including SSMS, SSIS, and SSDT.