CrowdFlower NDA Crowds - Secure, exceptional tasking at a massive scale. CrowdFlower
In this webinar we talked about CrowdFlower partnership with iMerit to offer users a new standard of data security, trust, and confidentiality through CrowdFlower's NDA crowd offering.
CrowdFlower Product Webinar - Graphical Editor and Visual ReportsCrowdFlower
Jack Shay (VP of Product) and Romeo Leon (VP of Customer Success) demo two great new CrowdFlower features: our new and improved Graphical Editor and the beta version of our Visual Reports feature.
We'll be creating a new job with the Graphical Editor, then showing off the report functionality on results from that same job.
On Monday, Oct. 21, CrowdFlower ran its first session of CrowdFlower University. The turn out was incredible. We had more than 30 people representing a range of organizations, from startups to Fortune 100 companies. It was an engaging 4 hour session filled with great questions.
Building Better Models Faster Using Active LearningCrowdFlower
Active learning is an increasingly popular technique for rapidly iterating the construction of machine learning models, exploiting the fact that the current state of the model can be used to predict which additional examples will be the most informative. Active learning is appealing for two main reasons: it optimizes ongoing human involvement in the model building process, and it helps overcome the negative effects of imbalanced training data. In this talk, Nick explains how active learning helps overcome common obstacles to building successful models, and also offers a peek into how CrowdFlower's new active learning based offering, CrowdFlower AI.
Adaptive Content equals Architecture plus Process minus Reality [Noz Urbina, ...Noz Urbina
Adaptive content is one of the most powerful and critical concepts of this decade. It is an attempt to address a never-before-seen diversity of content contexts and platforms, as well as sky-high user expectations. We are in an age where our smartphones are already starting to bore us. What were head-spinning miracles of science and technology less than three years ago “lack innovation” today. With customers assimilating new technologies into their lives and resetting expectations at this speed, the pressure to provide innovative, differentiating and strategically significant content experience is higher than ever. New platforms and interface paradigms are just around the corner. Adaptive content promises to help us address these challenges, but it still takes organisations years to adapt themselves. Noz Urbina focuses on how content architecture and process need to be altered for adaptive content, and what to do when reality sets in.
Data Foundation for Analytics Excellence by Tanimura, cathy from OktaTin Ho
This is presentation of Cathy Tanimura
Director of Analytics & Big Data @ Okta at Predictive Analytics & Business Insights 2014 conference USA
November 19, 2014
Code Wars: Database Decisions for Application DevelopmentNeo4j
Jennifer Reif, Neo4j ( https://twitter.com/JMHReif )
From relational to NoSQL to graph, we will explore various types of data, the way it is stored, and how best to go about retrieving it.
CrowdFlower NDA Crowds - Secure, exceptional tasking at a massive scale. CrowdFlower
In this webinar we talked about CrowdFlower partnership with iMerit to offer users a new standard of data security, trust, and confidentiality through CrowdFlower's NDA crowd offering.
CrowdFlower Product Webinar - Graphical Editor and Visual ReportsCrowdFlower
Jack Shay (VP of Product) and Romeo Leon (VP of Customer Success) demo two great new CrowdFlower features: our new and improved Graphical Editor and the beta version of our Visual Reports feature.
We'll be creating a new job with the Graphical Editor, then showing off the report functionality on results from that same job.
On Monday, Oct. 21, CrowdFlower ran its first session of CrowdFlower University. The turn out was incredible. We had more than 30 people representing a range of organizations, from startups to Fortune 100 companies. It was an engaging 4 hour session filled with great questions.
Building Better Models Faster Using Active LearningCrowdFlower
Active learning is an increasingly popular technique for rapidly iterating the construction of machine learning models, exploiting the fact that the current state of the model can be used to predict which additional examples will be the most informative. Active learning is appealing for two main reasons: it optimizes ongoing human involvement in the model building process, and it helps overcome the negative effects of imbalanced training data. In this talk, Nick explains how active learning helps overcome common obstacles to building successful models, and also offers a peek into how CrowdFlower's new active learning based offering, CrowdFlower AI.
Adaptive Content equals Architecture plus Process minus Reality [Noz Urbina, ...Noz Urbina
Adaptive content is one of the most powerful and critical concepts of this decade. It is an attempt to address a never-before-seen diversity of content contexts and platforms, as well as sky-high user expectations. We are in an age where our smartphones are already starting to bore us. What were head-spinning miracles of science and technology less than three years ago “lack innovation” today. With customers assimilating new technologies into their lives and resetting expectations at this speed, the pressure to provide innovative, differentiating and strategically significant content experience is higher than ever. New platforms and interface paradigms are just around the corner. Adaptive content promises to help us address these challenges, but it still takes organisations years to adapt themselves. Noz Urbina focuses on how content architecture and process need to be altered for adaptive content, and what to do when reality sets in.
Data Foundation for Analytics Excellence by Tanimura, cathy from OktaTin Ho
This is presentation of Cathy Tanimura
Director of Analytics & Big Data @ Okta at Predictive Analytics & Business Insights 2014 conference USA
November 19, 2014
Code Wars: Database Decisions for Application DevelopmentNeo4j
Jennifer Reif, Neo4j ( https://twitter.com/JMHReif )
From relational to NoSQL to graph, we will explore various types of data, the way it is stored, and how best to go about retrieving it.
LavaCon 2013 presentation: Building Content Collaboration at LSI Corporation ...Vasont Systems
With the challenges of a globally dispersed team, a wide variety of products, a unique and varied publishing model, and continuous corporate acquisitions and divestitures, LSI Corporation has conquered the issue of global content collaboration within their organization. Learn how they did it in this case study presentation.
SpringIO 2016 - Spring Cloud MicroServices, a journey inside a financial entityjordigilnieto
The presentation explains the journey from a monolithic architecture to Spring Cloud Microservices for application development inside a financial entity, along with the transition to DevOps strategies… a journey that has just begun…
Presentation at Bio IT World West: To AI or Not to AI, Presented by Simon Tay...Lucidworks
Breakout presentation in Machine Learning Track at Bio IT World West 2019 / Molecular Med Tri-Conference. Presented by Lucidworks VP Global Partners and Alliances, Simon Taylor.
Frank Bien, CEO of Looker - along with Amazon, Google and other data disrupters - discuss how innovators are deeply integrating analytics into every aspect of their businesses, from mobile to warehouse to cloud.
Frank shares Looker’s vision for the future of business intelligence and data analytics and reveal pivotal product and partnership updates.
Advanced Analytics Implementations at EA scaleAni Lopez
Designing and managing advanced analytics implementations for 35+ digital properties, across more than 10 different production teams, is not an easy task. It became even more challenging when the analytics team at Electronic Arts (EA) was required to migrate from Adobe Site Catalyst to Google Analytics Premium to Google Universal Analytics, all within the span of 18 months. Of course, all of this had to be done while still keeping pace with the company’s frenzied schedule for publishing blockbuster games, each of which requires new sites and countless marketing campaigns. See how we used Tealium iQ™ tag management system to achieve our objectives.
The Next Generation of AI-Powered SearchLucidworks
Trey Grainger, Chief Algorithms Officer, at Lucidworks delivers the closing keynote for ACTIVATE 2019, the Search and AI Conference hosted by Lucidworks.
Usually the last system to be implemented, but many times the most important lifeline for the customer, is the software help system.
Ideally, your software system is perfectly understandable and problem-free from lots of user testing and software iteration, but that’s usually not the case. Your customers may get frustrated and leave the site, and may even complain about their crappy experience to others. Providing your customers with effective Help is the last chance you have to turn a frustrating experience into a meaningful one.
As a User Experience professional, what Help strategies should you consider? What content do you need? How do you find an appropriate Help Authoring Tool for your software product? There are many vendor systems out there that offer many features. What do you really need? Attendees will learn about help system considerations such as: how to connect the help system to your software, content management features, content strategy, localization, statistics, and more, which will assist you in finding a solution that helps your frustrated customers become happy customers.
How to Make your Graph DB Project Successful with Neo4j ServicesNeo4j
Neo4j is widely used across many industries to tackle a multitude of modern-day business challenges. From powering Walmart’s retail recommendation system, to detecting fraud at Fortune 500 financial institutions, to optimizing delivery service routing at eBay, the Neo4j team has helped implement projects across a wide spectrum of industries and use-cases. Using this breadth of experience combined with a unique expertise in the application of graph databases, the Neo4j Consulting team offers a number of services ranging from product training, PoC evaluations and early data modelling, to getting projects into production on the Neo4j graph database.
Attend this webinar to hear how other top organisations have quickly and successfully launched their graph database projects by leveraging Neo4j Consulting Services and learn more about the different offerings available.
Not a whole lot of organization have fitted Hadoop into a Scrum Development Framework. This deck provides the expectations around what needs to be done in Hadoop in order to support Scrum.
Tips in migrating to SharePoint 2016 or O365, to avoid a migration headacheMike Maadarani
Migrating data into any platform is a difficult task, especially if you are moving into Office 365. If you are migrating to either SharePoint On-Premise or O365, you will need preparation, good planning, and detailed execution activities are keys to avoid migration nightmare.
This session will help you learn a methodology, adopted based on many previous migration, to help you deliver a successful migration project with happy users. We will learn the steps you need in your pre-migration analysis, migration checklists, post migration support, and any issues you might face during and after completing the migration efforts.
Slides from Enterprise Search & Analytics Meetup @ Cisco Systems - http://www.meetup.com/Enterprise-Search-and-Analytics-Meetup/events/220742081/
Relevancy and Search Quality Analysis - By Mark David and Avi Rappoport
The Manifold Path to Search Quality
To achieve accurate search results, we must come to an understanding of the three pillars involved.
1. Understand your data
2. Understand your customers’ intent
3. Understand your search engine
The first path passes through Data Analysis and Text Processing.
The second passes through Query Processing, Log Analysis, and Result Presentation.
Everything learned from those explorations feeds into the final path of Relevancy Ranking.
Search quality is focused on end users finding what they want -- technical relevance is sometimes irrelevant! Working with the short head (very frequent queries) has the most return on investment for improving the search experience, tuning the results, for example, to emphasize recent documents or de-emphasize archive documents, near-duplicate detection, exposing diverse results in ambiguous situations, using synonyms, and guiding search via best bets and auto-suggest. Long-tail analysis can reveal user intent by detecting patterns, discovering related terms, and identifying the most fruitful results by aggregated behavior. all this feeds back into the regression testing, which provides reliable metrics to evaluate the changes.
By merging these insights, you can improve the quality of the search overall, in a scalable and maintainable fashion.
LavaCon 2013 presentation: Building Content Collaboration at LSI Corporation ...Vasont Systems
With the challenges of a globally dispersed team, a wide variety of products, a unique and varied publishing model, and continuous corporate acquisitions and divestitures, LSI Corporation has conquered the issue of global content collaboration within their organization. Learn how they did it in this case study presentation.
SpringIO 2016 - Spring Cloud MicroServices, a journey inside a financial entityjordigilnieto
The presentation explains the journey from a monolithic architecture to Spring Cloud Microservices for application development inside a financial entity, along with the transition to DevOps strategies… a journey that has just begun…
Presentation at Bio IT World West: To AI or Not to AI, Presented by Simon Tay...Lucidworks
Breakout presentation in Machine Learning Track at Bio IT World West 2019 / Molecular Med Tri-Conference. Presented by Lucidworks VP Global Partners and Alliances, Simon Taylor.
Frank Bien, CEO of Looker - along with Amazon, Google and other data disrupters - discuss how innovators are deeply integrating analytics into every aspect of their businesses, from mobile to warehouse to cloud.
Frank shares Looker’s vision for the future of business intelligence and data analytics and reveal pivotal product and partnership updates.
Advanced Analytics Implementations at EA scaleAni Lopez
Designing and managing advanced analytics implementations for 35+ digital properties, across more than 10 different production teams, is not an easy task. It became even more challenging when the analytics team at Electronic Arts (EA) was required to migrate from Adobe Site Catalyst to Google Analytics Premium to Google Universal Analytics, all within the span of 18 months. Of course, all of this had to be done while still keeping pace with the company’s frenzied schedule for publishing blockbuster games, each of which requires new sites and countless marketing campaigns. See how we used Tealium iQ™ tag management system to achieve our objectives.
The Next Generation of AI-Powered SearchLucidworks
Trey Grainger, Chief Algorithms Officer, at Lucidworks delivers the closing keynote for ACTIVATE 2019, the Search and AI Conference hosted by Lucidworks.
Usually the last system to be implemented, but many times the most important lifeline for the customer, is the software help system.
Ideally, your software system is perfectly understandable and problem-free from lots of user testing and software iteration, but that’s usually not the case. Your customers may get frustrated and leave the site, and may even complain about their crappy experience to others. Providing your customers with effective Help is the last chance you have to turn a frustrating experience into a meaningful one.
As a User Experience professional, what Help strategies should you consider? What content do you need? How do you find an appropriate Help Authoring Tool for your software product? There are many vendor systems out there that offer many features. What do you really need? Attendees will learn about help system considerations such as: how to connect the help system to your software, content management features, content strategy, localization, statistics, and more, which will assist you in finding a solution that helps your frustrated customers become happy customers.
How to Make your Graph DB Project Successful with Neo4j ServicesNeo4j
Neo4j is widely used across many industries to tackle a multitude of modern-day business challenges. From powering Walmart’s retail recommendation system, to detecting fraud at Fortune 500 financial institutions, to optimizing delivery service routing at eBay, the Neo4j team has helped implement projects across a wide spectrum of industries and use-cases. Using this breadth of experience combined with a unique expertise in the application of graph databases, the Neo4j Consulting team offers a number of services ranging from product training, PoC evaluations and early data modelling, to getting projects into production on the Neo4j graph database.
Attend this webinar to hear how other top organisations have quickly and successfully launched their graph database projects by leveraging Neo4j Consulting Services and learn more about the different offerings available.
Not a whole lot of organization have fitted Hadoop into a Scrum Development Framework. This deck provides the expectations around what needs to be done in Hadoop in order to support Scrum.
Tips in migrating to SharePoint 2016 or O365, to avoid a migration headacheMike Maadarani
Migrating data into any platform is a difficult task, especially if you are moving into Office 365. If you are migrating to either SharePoint On-Premise or O365, you will need preparation, good planning, and detailed execution activities are keys to avoid migration nightmare.
This session will help you learn a methodology, adopted based on many previous migration, to help you deliver a successful migration project with happy users. We will learn the steps you need in your pre-migration analysis, migration checklists, post migration support, and any issues you might face during and after completing the migration efforts.
Slides from Enterprise Search & Analytics Meetup @ Cisco Systems - http://www.meetup.com/Enterprise-Search-and-Analytics-Meetup/events/220742081/
Relevancy and Search Quality Analysis - By Mark David and Avi Rappoport
The Manifold Path to Search Quality
To achieve accurate search results, we must come to an understanding of the three pillars involved.
1. Understand your data
2. Understand your customers’ intent
3. Understand your search engine
The first path passes through Data Analysis and Text Processing.
The second passes through Query Processing, Log Analysis, and Result Presentation.
Everything learned from those explorations feeds into the final path of Relevancy Ranking.
Search quality is focused on end users finding what they want -- technical relevance is sometimes irrelevant! Working with the short head (very frequent queries) has the most return on investment for improving the search experience, tuning the results, for example, to emphasize recent documents or de-emphasize archive documents, near-duplicate detection, exposing diverse results in ambiguous situations, using synonyms, and guiding search via best bets and auto-suggest. Long-tail analysis can reveal user intent by detecting patterns, discovering related terms, and identifying the most fruitful results by aggregated behavior. all this feeds back into the regression testing, which provides reliable metrics to evaluate the changes.
By merging these insights, you can improve the quality of the search overall, in a scalable and maintainable fashion.
Citihub Open Source and Cloud approach to Social Media ListeningChris Allison
Citihub Consulting discusses open, flexible technology solutions for Social Listening based on open source technologies running on public cloud platforms.
Mark Dehmlow, Head of the Library Web Department at the University of Notre Dame
At the University of Notre Dame, we recently implemented a new website in concert with rolling out a “next generation” OPAC into production for our campus. While much of the pre-launch feedback was positive, once we implemented the new systems, we started receiving a small number of intense criticisms and a small wave of problem reports. This presentation covers how to plan for big technology changes, prepare your organizations, effectively manage the barrage of post implementation technical problems, and mitigate customer concerns and criticisms. Participants are encouraged to bring brief war stories, anecdotes, and suggestions for managing technology implementations.”
We live in a world of silos - separate systems each with data essential to our daily work. No organization has all its important information in one place - 61% of knowledge workers regularly access 4 or more systems to get the information they need to do their jobs, and 15% need 11 or more systems. Integration to provide a unified view across these systems is very valuable, but it has been difficult to accomplish - even between different Microsoft products. This seminar will show you how to bridge across these silos using a search-based approach that is both quick and powerful.
Webinar: Question Answering and Virtual Assistants with Deep LearningLucidworks
In this webinar, Lucidworks Data Scientists Sanket Shahane and Sava Kalbachou will look at how Deep Learning can be used to create Question Answering and Virtual Assistant type systems and the accuracy and performance of different approaches. We’ll even demo an insurance-industry question answering system scenario.
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...ryanorban
Data scientists, data engineers, and data businesspeople are critical to leveraging data in any organization. A common complaint from data science managers is that data scientists invest time prototyping algorithms, and throw them over a proverbial fence to engineers to implement, only to find the algorithms must be rebuilt from scratch to scale. This is a symptom of a broader ailment -- that data teams are often designed as functional silos without proper communication and planning.
This talk outlines a framework to build and organize a data team that produces better results, minimizes wasted effort among team members, and ships great data products.
Lean Analytics is a set of rules to make data science more streamlined and productive. It touches on many aspects of what a data scientist should be and how a data science project should be defined to be successful. During this presentation Richard will present where data science projects go wrong, how you should think of data science projects, what constitutes success in data science and how you can measure progress. This session will be loaded with terms, stories and descriptions of project successes and failures. If you're wondering whether you're getting value out of data science, how to get more value out of it and even whether you need it then this talk is for you!
What you will take away from this session
Learn how to make your data science projects successful
Evaluate how to track progress and report on the efficacy of data science solutions
Understand the role of engineering and data scientists
Understand your options for processes and software
In this talk we cover
1. Why NLP and DL
2. Practical Challenges
3. Some Popular Deep Learning models for NLP
Today you can take any webpage in any language and translate it automatically into language you know! You can also cut paste an article or other document into NLP systems and immediately get list of companies and people it talks about, topics that are relevant and the sentiment of the document. When you talk to Google or Amazon assistant, you are using NLP systems. NLP is not perfect but given the advances in last two years and continuing, it is a growing field. Let’s see how it actually works, specifically using Deep learning
About Shishir
Shishir is a Senior Data Scientist at Thomson Reuters working on Deep Learning and NLP to solve real customer pain, even ones they have become used to.
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Sonya Liberman
Outbrain is the world’s largest discovery platform, bringing personalized and relevant content to audiences while helping publishers understand their audiences through data.
Its recommender system is serving billions of content recommendations daily, based on millions of hourly user interactions.
Our predictive models span over a variety of supervised learning techniques, ranging from content-based recommenders, through behavioral models and all the way to collaborative techniques such as factorization machines. Agility and stability are crucial aspects of the system.
This talk will cover our journey towards solutions that would not compromise neither on scale nor on model complexity, and design a dynamic framework that shortens the cycle between research and production.
We will cover the different stages of the framework, including important take away lessons for data scientists as well as software engineers.
Sonya Liberman is leading a team of Machine Learning Engineers and Data Scientists building large-scale recommender systems for personalized content discovery @ Outbrain, serving tens of billions real-time recommendations a day.
Especially enjoys bringing theory to production and seeing how it affects the engagement of (many) users.
This invited talk was given at ILTechTalk Week, 2018 by Shaked Bar, a Teach Lead and Algorithms Engineer in the team.
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
Do terms like "Data Lake" confuse you? You’re not alone. With all of the technology buzzwords flying around today, it can become a task to keep up with and clearly understand each of them. However a data lake is definitely something to dedicate the time to understand. Leveraging data lake technology, companies are finally able to keep all of their disparate information and streams of data in one secure location ready for consumption at any time – this includes structured, unstructured, and semi-structured data. For more information on our Big Data Consulting Services, don’t hesitate to visit us online at: http://bit.ly/2fvV5rR
Techniques to build, engage and manage your intranet projectRebecca Jackson
Workshop delivered at Ark Intranets and Strategy March 5 2015.
As busy intranet teams with limited time and budget, making improvements, or even rebuilding an intranet can be a daunting prospect. In this workshop Rebecca will take you through a number of techniques which you can do yourself, to help build, manage and engage your staff in your intranet project.
- Overview of user experience and change management
techniques to increase engagement
- Hands on activities to go in-depth into techniques such as card-sorting and personas
Speaker: Venkatesh Umaashankar
LinkedIn: https://www.linkedin.com/in/venkateshumaashankar/
What will be discussed?
What is Data Science?
Types of data scientists
What makes a Data Science Team? Who are its members?
Why does a DS team need Full Stack Developer?
Who should lead the DS Team
Building a Data Science team in a Startup Vs Enterprise
Case studies on:
Evolution Of Airbnb’s DS Team
How Facebook on-boards DS team and trains them
Apple’s Acqui-hiring Strategy to build DS team
Spotify -‘Center of Excellence’ Model
Who should attend?
Managers
Technical Leaders who want to get started with Data Science
When it comes to AI and its applications, there are a number of myths being perpetuated by the mainstream media. It's time to dispel these myths because the opportunity to apply AI to your business is real.
Human-in-the-loop machine learning is an incredibly valuable design pattern used in real world machine learning deployments across many types of applications. Taking the best of what humans can do and combining that with the best computers is practical and powerful. Not only that, but it mimics a strategy called active learning, forcing the training data collection process to become very efficient so the algorithm gets better and better.
Virtual Data Steward: Data Management 3.0CrowdFlower
Every company that is serious about data governance needs data stewards. Data stewards connect business information requirements and processes with information technology capabilities. This function is essential to bridging data management policies and standards to day-to-day operational practices.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
How Oracle Uses CrowdFlower For Sentiment Analysis
1. How Oracle Uses CrowdFlower's
Data Enrichment Platform For
Sentiment Analysis
2. Before we get started
THIS IS A TITLE
#RichData
The housekeeping items:
• Webinar slides, recording, and Q&A will be
emailed
• Enter questions in chat on webinar panel
• Or ask your questions on twitter -
@CrowdFlower
- Use #RichData
3. Meet the Data Scientists
THIS IS A TITLE
Randall Sparks
Principal Member of Technical Staff
Oracle Data Cloud — Social Platform Group
Pallika Kanani
Senior Research Staff Member
Oracle Labs
Lukas Biewald | @L2K
CEO and Founder
CrowdFlower
#RichData
4. • Test Question Infrastructure
• Support for tracking contributor
agreement and data quality
People-Powered
Feedback
Overview
What will be covered today?
Train and perfect your algorithms
to build sentiment & other models
that classify text
• Multiple language support
• World-wide contributor network
• Data enrichment capabilities
Insights Why CrowdFlower?
Real examples of data
collection, data modeling done
by Oracle
Use Cases
#RichData
5. #RichData
Randall Sparks
• Oracle Data Cloud – Social Platform Grou
• Use case: Social Media Analytics
• Data Collection, Data Modeling Process
• Use case: Multiple Languages
6. About Us
• Oracle Data Cloud — Social Platform Group
– Data Service supporting multiple applications
– Monitoring & Analysis of Social Media Streams & other text sources
• Categorization of social media streams to topics +
enrichments
– Key words/phrases, Semantic vectors (LSA)
• Enrichments
– Themes within a topic, related terms appearing in messages
– Demographics, Location, Indicators of intent, etc.
– Sentiment
• Social Relationship Management
(SRM) Product
#RichData
7. What We Do
• Collect, filter, & analyze a large volume of streaming social
media content from multiple content sources via multiple
suppliers/aggregators
• Multiple (30+) languages — big data collection challenge
• Process
– Collect content streamed from multiple suppliers/aggregators
– Text filtering, normalization, tokenization, chunking, etc. (NLP)
– “Categorize” messages (match snippets to “Topics”)
– Topics: combinations of keywords/phrases +
semantic filters: vector comparison of words & texts in
“semantic space” using Latent Semantic Analysis (LSA)
#RichData
8. Use Case: Social Media Analytics
Keywords/phrases + Semantic filters
#RichData
13. Use Case: Social Media Analytics — Example View
• Media Types of matched “snippets”
#RichData
14. Why We Need Sentiment Data?
• Train sentiment model (Machine Learning)
– Training data: 1000s of human-annotated items
– Features: words
• also: n-grams, phrases, known negation/intensification
patterns, etc.
• punctuation, emoticons, emoji, other metadata
– Various algorithms:
• Decision Trees, Logistic Regression,
Support Vector Machine (SVM), etc.
• Analyze model
– held-out test set
– accuracy, precision/recall, etc.
#RichData
15. Data Collection & Modeling Process
• Generate “gold” test item data:
– Transform into (our) standard format for upload to
CrowdFlower
– Define CrowdFlower job to generate test questions &
upload data
– Run job & download results
– Select “gold” test items based on analysis of contributor
agreement
#RichData
16. • Generate full training & test data sets:
– Define main CrowdFlower job, upload data & test items
– Launch & monitor job (remove problematic test questions)
– Download & analyze results
– Select (high-agreement) items for ML sentiment model
training
– Build sentiment model, test, & deploy
Data Collection & Modeling Process (continued)
#RichData
19. #RichData
Pallika Kanani
• About Oracle Labs
• Power of human-annotated data
• Use case – Language understanding
• Use case – Wisdom of the crowd
• Use case – Data quality
20. #RichData
Information Retrieval and Machine Learning Group
• Strong research program, publications
• Develop core Information Retrieval, Statistical Natural
Language Processing and Machine Learning
technologies
• Help solve complex and challenging business problems
across Oracle
• Utilize CrowdFlower platform for a wide variety of
relevance ranking and NLP problems
21. Data Annotation
• First step in building search
/ NLP / machine learning
application
• Many Machine Learning
techniques require some
human-annotated data
• Even for unsupervised
methods, need annotated
data for proper evaluation
#RichData
22. Use Case: Language Understanding
• Goal: Get a better understanding of what our customers
are talking about
• Extract useful information from raw text
• Language is all about context: Disambiguating extracted
information is crucial, and people are good at
understanding context
– Are people talking about New York subway or
Subway, the restaurant?
#RichData
23. CrowdFlower as a data enrichment platform
• Data collection for Machine Learning used to be tedious
– Long iterations typically lasting weeks and months
– High prohibitive costs
– Difficult to innovate overfitting to existing corpora
• Try out new tasks at previously unimaginable speed
• Designing a job for a new NLP task is as short as a day,
getting results can be matter of hours
• Rapid Prototyping due to affordable cost for early trials
(and final data collection)
Before
After
#RichData
24. Rapid Feedback
• Rapid
debugging of
the data
collection
process
• Works like
debugging a
software with
humans in the
loop
#RichData
25. Wisdom of the Crowd
• Incorrect test questions
due to lack of
knowledge of pop
culture
• The crowd set me
straight
“’Say Something’ is the name of a
song. Please fix your test
question”
#RichData
26. Data Quality
• Good quality data
even for tricky tasks
• Example: Ran a task
for finding relevant
URLs from Wikipedia,
and got excellent
results
#RichData
28. What’s next?
THIS IS A TITLE
• Look out for a follow up email with a copy of these
slides, a recording of the webinar, Q&A recap, and
other fun stuff
• View and share this presentation on Slideshare
- Follow us for more such events
• Next webinar:
- CrowdFlower User Webinar: Graphical Editor and Visual
Reports
- September 10th 2015 – 10:00 AM PST
- Register at: http://www.crowdflower.com/events
#RichData
29. Rich Data Summit
What is Rich Data Summit?
The leading conference for data scientists
focused on turning big data into rich,
meaningful data
• Data Scientists – 300+
• Sessions focused on Data Science – 5
• Hands-on Workshops – 9
Qualified webinar attendees will receive 30%
discount coupon
Interested? Email us at
conference@crowdflower.com
www.richdatasummit.com
@RichDataSummit
#RichData