The document introduces FeatureByte, a platform for creating machine learning features. It allows users to quickly create features, experiment with different features, and deploy features without needing to manage pipelines. The document provides an overview of FeatureByte's architecture and how it leverages existing data warehouses. It then demonstrates how to use FeatureByte to transform nine feature ideas for a grocery dataset into actual machine learning inputs, including summarizing the process for each feature idea. Finally, it shows how to obtain training data and deploy the new features via a REST API.
Simplify Feature Engineering in Your Data WarehouseFeatureByte
Feature Engineering is critical to successful delivery of AI solutions. Crafting relevant features from organization data requires business domain knowledge and creativity, powered by human capital in data science teams.
With growing adoption of machine learning and AI in organizations, there is a pressing need to develop processes around ML development and deployment to maximize productivity with limited resources. While there is no lack of tools for ML model management, solutions for feature engineering remains inadequate.
In this presentation, we outline our approach and design to make feature engineering efficient, repeatable and enjoyable for data science practitioners so they can experiment and iterate fast, without overlooking important issues such as scalability, deployment and auditability.
Optimizing Code Reusability for SharePoint using Linq to SharePoint & the MVP...Sparkhound Inc.
Whether developing a small customization or a large enterprise solution, one goal is to minimize redundancy in Code. In this presentation, Sparkhound Consultant Ted Wagner shows how the MVP design pattern is used in SharePoint to create business models that can be reused easily between other ASP or C# application.
Simplify Feature Engineering in Your Data WarehouseFeatureByte
Feature Engineering is critical to successful delivery of AI solutions. Crafting relevant features from organization data requires business domain knowledge and creativity, powered by human capital in data science teams.
With growing adoption of machine learning and AI in organizations, there is a pressing need to develop processes around ML development and deployment to maximize productivity with limited resources. While there is no lack of tools for ML model management, solutions for feature engineering remains inadequate.
In this presentation, we outline our approach and design to make feature engineering efficient, repeatable and enjoyable for data science practitioners so they can experiment and iterate fast, without overlooking important issues such as scalability, deployment and auditability.
Optimizing Code Reusability for SharePoint using Linq to SharePoint & the MVP...Sparkhound Inc.
Whether developing a small customization or a large enterprise solution, one goal is to minimize redundancy in Code. In this presentation, Sparkhound Consultant Ted Wagner shows how the MVP design pattern is used in SharePoint to create business models that can be reused easily between other ASP or C# application.
This is our contributions to the Data Science projects, as developed in our startup. These are part of partner trainings and in-house design and development and testing of the course material and concepts in Data Science and Engineering. It covers Data ingestion, data wrangling, feature engineering, data analysis, data storage, data extraction, querying data, formatting and visualizing data for various dashboards.Data is prepared for accurate ML model predictions and Generative AI apps
This is our project work at our startup for Data Science. This is part of our internal training and focused on data management for AI, ML and Generative AI apps
In this webinar, we will take a look on deploying Power BI Report in Dynamics 365 FOE using Entity Store and its entire configuration. We will take a look on how to create Analytics elements and discuss how to refresh it in Operations for using as DirectQuery. This will include configurations of Power BI report in D365 FOE workspaces.
Information on an Appcelerator Alloy project demonstrating the use of a restApi sync adapter along with Model/Collection Databinding to a TableView
Complete Project here on Github: https://github.com/aaronksaunders/scs-backbonetest1
Learn how to build advanced GraphQL queries, how to work with filters and patches and how to embed GraphQL in languages like Python and Java. These slides are the second set in our webinar series on GraphQL.
This is our contributions to the Data Science projects, as developed in our startup. These are part of partner trainings and in-house design and development and testing of the course material and concepts in Data Science and Engineering. It covers Data ingestion, data wrangling, feature engineering, data analysis, data storage, data extraction, querying data, formatting and visualizing data for various dashboards.Data is prepared for accurate ML model predictions and Generative AI apps
This is our project work at our startup for Data Science. This is part of our internal training and focused on data management for AI, ML and Generative AI apps
In this webinar, we will take a look on deploying Power BI Report in Dynamics 365 FOE using Entity Store and its entire configuration. We will take a look on how to create Analytics elements and discuss how to refresh it in Operations for using as DirectQuery. This will include configurations of Power BI report in D365 FOE workspaces.
Information on an Appcelerator Alloy project demonstrating the use of a restApi sync adapter along with Model/Collection Databinding to a TableView
Complete Project here on Github: https://github.com/aaronksaunders/scs-backbonetest1
Learn how to build advanced GraphQL queries, how to work with filters and patches and how to embed GraphQL in languages like Python and Java. These slides are the second set in our webinar series on GraphQL.
Similar to Transforming Feature Ideas into Machine Learning Inputs (20)
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
2. Transforming Feature Ideas into actionable ML inputs
2
FeatureByte is a free and source-available platform designed to
empower data enthusiasts who love creating innovative features and
improving model accuracy through data.
With FeatureByte, you can:
● Quickly create features,
● Seamlessly experiment with features and
● Effortlessly deploy features
All without the headache of creating and managing new pipelines.
3. Free and source available platform released on May 8!
3
You can access our repository at https://github.com/featurebyte/featurebyte and our documentation at
https://docs.featurebyte.com/0.2/. Please check it out and give us your feedback! We're eager to hear your thoughts and
improve FeatureByte to better suit your needs.
4. Agenda
4
● Quick introduction to FeatureByte Architecture
● Introduction to the Grocery dataset
● Transform 9 Features Ideas
● Get training data and deploy our new features in production!
6. 6
Your DataBricks, Snowflake or Spark data warehouse is used as:
● data source
● compute engine
● feature store
The FeatureByte Service, which runs on Docker, acts
as the interface between the SDK and your data
warehouse.
It automatically manages tasks like validating
requests, scheduling and executing tasks, and storing
metadata.
This Python module transpiles the feature logical plan
to platform-specific SQL
Light-weight installation that leverages your Data Warehouse
You interact with the tool through the SDK, which
provides an intuitive framework for creating, sharing,
experimenting, deploying, and managing features.
7. Grocery dataset
Note: To access the Grocery dataset or play with FeatureByte's tutorials - you can install a pre-built local Spark data warehouse with pre-populated data, via
the command featurebyte.playground().
8. 8
We will work with a Catalog where 4 tables were registered from a grocery dataset.
Dimension table with
static description on
products
Item table with details on
purchased products in
customer invoices
Event table where each
row indicates an invoice
Slowly Changing Dimension table that contains data
on customers that change over time.
A catalog is an object that help organize your
feature engineering assets by domain
9. Simple observation set to preview features
9
Observation sets are used to materialize historical
feature values. They combine entity values and
historical points-in-time that you want to learn from.
We will work here with a small Pandas DataFrame to
preview features when customers are doing a purchase.
The observation set is retrieved from a sample of the
GROCERYINVOICE table’s view.
The column containing points-in-time must be named
‘POINT-IN-TIME’
The column containing entity values must be named
with an accepted serving name as defined during the
entity registration.
View objects are used to prepare data. Work like SQL
views but with a syntax similar to Pandas!
11. 9 feature ideas for the Customer entity
1. Frequency: the count of invoices by Customer in the past 12 weeks
2. Monetary: the amount spent by Customer in the past 12 weeks
3. Recency: the time since the Customer’s last invoice
4. Basket: the sum spent by Customer per Product Group in the past 12 weeks
5. Diversity: the entropy of the Customer’s basket per Product Group in the past 12 weeks
6. Stability: the cosine similarity of the Customer’s basket in the past 2 weeks compared to their
basket in the past 12 weeks
7. Similarity: the cosine similarity of the Customer’s basket compared to the baskets of Customers
living in the same state in the past 12 weeks
8. Regularity: the entropy of weekdays of the customer invoices in the past 12 weeks
9. Change: whether the Customer’s Address changed in the past 12 weeks
11
12. Feature Idea 1: Frequency
Count of invoices by Customer in the past 12 weeks
12
The groupby() method enables to group a view by
columns representing entities.
The aggregate_over() method enables aggregation
operation over a window prior to the observation point,
here 12 weeks.
Feature objects contain the logical plan to compute a feature
13. Feature Idea 2: Monetary
Amount spent by Customer in the past 12 weeks
13
Aggregation operations include latest, count, sum,
average, minimum, maximum, and standard deviation.
14. Feature Idea 3: Recency
Time since Customer last invoice
14
RequestColumn.point_in_time() enables the creation of
features that compare the value of an other feature with
the observation point
Another example of using this method would be to
derive a customer's age from their birth date and the
observation point.
15. Feature Idea 4: Basket
Sum spent by Customer per Product Group in the past 2 and 12 weeks
15
Aggregation can be done across a categorical column,
here ProductGroup.
When the feature is materialized, a dictionary is
returned.
In our example, the keys represent the ProductGroup,
and their corresponding values represent the total
amount spent on each Product Group.
16. Feature Idea 5: Diversity
Entropy of the Customer’s basket in the past 12 weeks
16
A dictionary feature can be transformed into another
feature. We apply here the entropy.
17. Feature Idea 6: Stability
Cosine Similarity of the Customer’s basket in the past 2 weeks vs their
basket in the past 12 weeks
17
2 dictionary features with different
windows can be compared using the cosine
similarity.
18. Feature Idea 7: Similarity
Cosine Similarity of the Customer’s basket in the past 12 weeks
vs the baskets of Customers living in the same state
18
When 2 dictionary features are grouped with 2 different
entities,
● they can be still compared using the cosine similarity.
● the entity of the resulting feature is determined based
on the relationship between the 2 entities.
In our example, the primary entity of
“Customer_Similarity_with_State_12w” is
grocerycustomer because grocerycustomer
is a child of the frenchstate entity.
The feature can be served by providing only
the grocerycustomer entity values. Values
for frenchstate entity are derived
automatically by FeatureByte.
19. Feature Idea 8: Regularity
Entropy of weekdays of the Customer’s invoices in the past 12 weeks
19
In this example, the keys of the dictionary will represent
the weekday, and their corresponding values will
represent the total amount spent on each weekday.
The weekdays dictionary feature is here transformed
into another feature by applying the entropy.
20. Feature Idea 9: Change
Whether Customer’s Address Changed in the past 12 weeks
20
Change View can be created from a Slowly Changing Dimension
(SCD) table to analyze changes that occur in an attribute, here the
StreetAddress of the grocerycustomer.
22. Audit one feature via its Definition file
22
The feature definition file is critical for maintaining the integrity
of a feature. It serves as the single source of truth for the feature,
providing an explicit outline of the intended operations of the
feature declaration, including inherited operations.
The file is automatically generated when a feature is declared or a
new version is derived. It is used to generate the final logical
execution graph, which is then transpiled into platform-specific
SQL for feature materialization.
24. Compute training data
24
You can choose to store your training data in the feature store for
reuse or audit.
If you connect FeatureByte to your data warehouse, the
computation of features is performed in the data warehouse,
taking advantage of its scalability, stability, and efficiency.
In this case, you can work with much larger observation tables with
millions of rows.
26. Thank You!
To get started, check out
Repo: https://github.com/featurebyte/featurebyte
Documentation: https://docs.featurebyte.com/0.2/
27. 27
Catalog objects help organize
assets by domain
View objects are used to prepare data. Work like SQL
views but with a syntax similar to Pandas!
Table objects centralize essential
metadata for feature engineering.
4 types: event table, item table,
SCD table and dimension table.
Entity objects and relationships facilitate
feature organization and serving Feature objects contain the logical plan to compute a feature
Python SDK to register, share and manage tables, features and
more…