This is the first time I introduced the concept of Schema-on-Read vs Schema-on-Write to the public. It was at Berkeley EECS RAD Lab retreat Open Mic Session on May 28th, 2009 at Santa Cruz, California.
This is the first time I introduced the concept of Schema-on-Read vs Schema-on-Write to the public. It was at Berkeley EECS RAD Lab retreat Open Mic Session on May 28th, 2009 at Santa Cruz, California.
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
This presentation provides a brief insight into a Big Data platform using the Hadoop ecosystem.
To this end the presentation will touch on:
-views of the Big Data ecosystem and it’s components
-an example of a Hadoop cluster
-considerations when selecting a Hadoop distribution
-some of the Hadoop distributions available
-a recommended Hadoop distribution
The strategic relationship between Hortonworks and SAP enables SAP to resell Hortonworks Data Platform (HDP) and provide enterprise support for their global customer base. This means SAP customers can incorporate enterprise Hadoop as a complement within a data architecture that includes SAP HANA and SAP BusinessObjects enabling a broad range of new analytic applications.
This presentation discusses the follow topics
What is Hadoop?
Need for Hadoop
History of Hadoop
Hadoop Overview
Advantages and Disadvantages of Hadoop
Hadoop Distributed File System
Comparing: RDBMS vs. Hadoop
Advantages and Disadvantages of HDFS
Hadoop frameworks
Modules of Hadoop frameworks
Features of 'Hadoop‘
Hadoop Analytics Tools
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UKhuguk
This session will give you an update on what SUSE is up to in the Big Data arena. We will take a brief look at SUSE Linux Enterprise Server and why it makes the perfect foundation for your Hadoop Deployment.
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Processing by "Sampat Kumar" from "Harman". The presentation was done at #doppa17 DevOps++ Global Summit 2017. All the copyrights are reserved with the author
SQL on Hadoop
Looking for the correct tool for your SQL-on-Hadoop use case?
There is a long list of alternatives to choose from; how to select the correct tool?
The tool selection is always based on use case requirements.
Read more on alternatives and our recommendations.
The strategic relationship between Hortonworks and SAP enables SAP to resell Hortonworks Data Platform (HDP) and provide enterprise support for their global customer base. This means SAP customers can incorporate enterprise Hadoop as a complement within a data architecture that includes SAP HANA, Sybase and SAP BusinessObjects enabling a broad range of new analytic applications.
In this webinar, we'll:
-Examine the key drivers and use cases for High Availability, performance and scalability for Apache Hadoop.
-Walk through an overview of reference architecture for a Non-Stop Hadoop implementation.
-Show how you can get started with Non-Stop Hadoop with the Hortonworks Data Platform.
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
Big Data and advanced analytics are critical topics for executives today. But many still aren't sure how to turn that promise into value. This presentation provides an overview of 16 examples and use cases that lay out the different ways companies have approached the issue and found value: everything from pricing flexibility to customer preference management to credit risk analysis to fraud protection and discount targeting. For the latest on Big Data & Advanced Analytics: http://mckinseyonmarketingandsales.com/topics/big-data
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Cloudera, Inc.
As Hadoop graduates from pilot project to a mission critical component of the enterprise IT infrastructure, integrating information held in Hadoop and in Enterprise RDBMS becomes imperative. We’ll look at key scenarios driving Hadoop and RDBMS integration and review technical options. In particular, we’ll deep dive into the Apache SQOOP project, which expedites data movement between Hadoop and any JDBC database, as well as providing an framework which allows developers and vendors to create connectors optimized for specific targets such as Oracle, Netezza etc.
This is part of an introductory course to Big Data Tools for Artificial Intelligence. These slides introduce students to Apache Hadoop, DFS, and Map Reduce.
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
This presentation provides a brief insight into a Big Data platform using the Hadoop ecosystem.
To this end the presentation will touch on:
-views of the Big Data ecosystem and it’s components
-an example of a Hadoop cluster
-considerations when selecting a Hadoop distribution
-some of the Hadoop distributions available
-a recommended Hadoop distribution
The strategic relationship between Hortonworks and SAP enables SAP to resell Hortonworks Data Platform (HDP) and provide enterprise support for their global customer base. This means SAP customers can incorporate enterprise Hadoop as a complement within a data architecture that includes SAP HANA and SAP BusinessObjects enabling a broad range of new analytic applications.
This presentation discusses the follow topics
What is Hadoop?
Need for Hadoop
History of Hadoop
Hadoop Overview
Advantages and Disadvantages of Hadoop
Hadoop Distributed File System
Comparing: RDBMS vs. Hadoop
Advantages and Disadvantages of HDFS
Hadoop frameworks
Modules of Hadoop frameworks
Features of 'Hadoop‘
Hadoop Analytics Tools
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UKhuguk
This session will give you an update on what SUSE is up to in the Big Data arena. We will take a brief look at SUSE Linux Enterprise Server and why it makes the perfect foundation for your Hadoop Deployment.
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Processing by "Sampat Kumar" from "Harman". The presentation was done at #doppa17 DevOps++ Global Summit 2017. All the copyrights are reserved with the author
SQL on Hadoop
Looking for the correct tool for your SQL-on-Hadoop use case?
There is a long list of alternatives to choose from; how to select the correct tool?
The tool selection is always based on use case requirements.
Read more on alternatives and our recommendations.
The strategic relationship between Hortonworks and SAP enables SAP to resell Hortonworks Data Platform (HDP) and provide enterprise support for their global customer base. This means SAP customers can incorporate enterprise Hadoop as a complement within a data architecture that includes SAP HANA, Sybase and SAP BusinessObjects enabling a broad range of new analytic applications.
In this webinar, we'll:
-Examine the key drivers and use cases for High Availability, performance and scalability for Apache Hadoop.
-Walk through an overview of reference architecture for a Non-Stop Hadoop implementation.
-Show how you can get started with Non-Stop Hadoop with the Hortonworks Data Platform.
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
Big Data and advanced analytics are critical topics for executives today. But many still aren't sure how to turn that promise into value. This presentation provides an overview of 16 examples and use cases that lay out the different ways companies have approached the issue and found value: everything from pricing flexibility to customer preference management to credit risk analysis to fraud protection and discount targeting. For the latest on Big Data & Advanced Analytics: http://mckinseyonmarketingandsales.com/topics/big-data
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Cloudera, Inc.
As Hadoop graduates from pilot project to a mission critical component of the enterprise IT infrastructure, integrating information held in Hadoop and in Enterprise RDBMS becomes imperative. We’ll look at key scenarios driving Hadoop and RDBMS integration and review technical options. In particular, we’ll deep dive into the Apache SQOOP project, which expedites data movement between Hadoop and any JDBC database, as well as providing an framework which allows developers and vendors to create connectors optimized for specific targets such as Oracle, Netezza etc.
This is part of an introductory course to Big Data Tools for Artificial Intelligence. These slides introduce students to Apache Hadoop, DFS, and Map Reduce.
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Data Con LA
The team at Fandango heartily embraced NoSQL, using Couchbase to power a key media publishing system. The initial implementation was fraught with integration issues and high latency, and required a major effort to successfully refactor. My talk will outline the key organizational and architectural decisions that created deep systemic problems, and the steps taken to re-architect the system to achieve a high level of performance at scale.
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...Data Con LA
Apache Solr makes it so easy to interactively visualize and explore your data. Create a dashboard, add some facets, select some values, cross it with the time… and just look at the results.
Apache Spark is the growing framework for performing streaming computations, which makes it ideal for real time indexing.
Solr also comes with new Analytics Facets which are a major weapon added to the arsenal of the data explorer. They bring another dimension: calculations. We can now do the equivalent of SQL, just in a much simpler and faster way. These calculations can operate over buckets of data. For example, it is now possible to see the sum of Web traffic by country over the time, the median price of some categories of products, which ads are bringing more money by location...
This talks puts in practice some of the leading features of Solr Search. It presents the main types of facets/stats and which advanced properties and usage make them shine. A demo in parallel with the open source Search App in Hue will demonstrate how these facets can power interactive widgets or your own analytic queries. The data will be indexed in real time from a live stream with Spark.
Kiji cassandra la june 2014 - v02 clint-kellyData Con LA
Big Data Camp LA 2014, Don't re-invent the Big-Data Wheel, Building real-time, Big Data applications on Cassandra with the open-source Kiji project by Clint Kelly of Wibidata
Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA
More and more organizations are turning to Hadoop and NoSQL to manage big data. In fact, many IT professionals consider each of those terms to be synonymous with big data. At the same time, these two technologies are seen as different beasts that handle different challenges. That means they are often deployed in a rather disjointed way, even when intended to solve the same overarching business problem. The emerging trend of “in-Hadoop databases” promises to narrow the deployment gap between them and enable new enterprise applications. In this talk, Dale will describe that integrated architecture and how customers have deployed it to benefit both the technical and the business teams.
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Data Con LA
During my time working on attribution and ingest systems, I've encountered several different approaches to solving the simple question: "How do I get data from A to B". In this session, I'd like to share some of the problems I've encountered and how to effectively solve them.
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...Data Con LA
NoSQL has exploded on the developer scene promising alternatives to RDBMS that make rapidly developing, Internet scale applications easier than ever. However, as a trade off to the ease of development and scale, some of the familiarity with other well-known query interfaces such as SQL, has been lost. Until now that is...N1QL (pronounced ‘N1QL’) is a SQL like query language for querying JSON, which brings the familiarity of RDBMS back to the NoSQL world. In this session you will learn about the syntax and basics of this new language as well as Integration with the Couchbase SDKs.
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...Data Con LA
Investigated a couple audio based, deep learning strategies for identifying human vocalized car sounds. In one case Mel Frequency Cepstral Coefficients, MFCCs, were used as inputs into a supervised, logistic regression neural network. In a separate case, Short Term Fourier Transforms ,STFT, were used to generate PCA whitened spectograms, which were used as inputs into a supervised, convolutional neural network. The MFCC method trained quickly on a relative small dataset of 4 sounds. The STFT method resulted in a much larger input matrix, resulting in much longer times for converging onto a solution
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Data Con LA
Online decision making over time needs interacting with an ever changing environment and underlying machine learning models need to change and adapt to this changing environment. This talk discusses class of machine learning algorithms and provides details of how the computation is parallelized using the Spark framework.
Hadoop essentials by shiva achari - sample chapterShiva Achari
Sample chapter of Hadoop Ecosystem
Delve into the key concepts of Hadoop and get a thorough understanding of the Hadoop ecosystem
For more information: http://bit.ly/1AeruBR
At APTRON Delhi, we believe in hands-on learning. That's why our Hadoop training in Delhi is designed to give you practical experience working with Hadoop. You'll work on real-world projects and learn from experienced instructors who have worked with Hadoop in the industry.
https://bit.ly/3NnvsHH
Presented By :- Rahul Sharma
B-Tech (Cloud Technology & Information Security)
2nd Year 4th Sem.
Poornima University (I.Nurture),Jaipur
www.facebook.com/rahulsharmarh18
This slide gives a simple and purposeful knowledge about popular Hadoop platforms.
From simple definition to importance of Hadoop in modern era the presentation also introduces Hadoop service providers along with its core components.
Do go through it once and comment below with your feedback. I am sure that this slide will help many in presenting basics of Hadoop for their projects or business purpose.
The crisp information has been generated after going through detailed information available on internet as well as research papers
The course additionally covers Configuring, Deploying, and Maintaining a Hadoop Cluster. The Hadoop Admin coaching is concentrated on sensible active exercises and encourages open discussions of however folk’s area unit exploitation Hadoop in enterprises managing massive knowledge sets.
Hadoop as a Service ( as offered by handful of niche vendors now ) is a cloud computing solution that makes medium and large-scale data processing accessible, easy, fast and inexpensive. This is achieved by eliminating the operational challenges of running Hadoop, so one can focus on business growth.
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
Financial services companies can reap tremendous benefits from 'Big Data' and they have moved quickly to deploy it. But these companies also place heavy demands on 'Big Data' infrastructure for flexibility, reliability and performance. In this webinar, Hortonworks joins WANDisco to look at three examples of using 'Big Data' to get a more comprehensive view of customer behavior and activity in the banking and insurance industries. Then we'll pull out the common threads from these examples, and see how a flexible next-generation Hadoop architecture lets you get a step up on improving your business performance. Join us to learn:
How to leverage data from across an entire global enterprise
How to analyze a wide variety of structured and unstructured data to get quick, meaningful answers to critical questions
What industry leaders have put in place
Hybrid Data Warehouse Hadoop ImplementationsDavid Portnoy
Data Warehouse vendors are evolving to incorporate the best Hadoop has to offer. Similarly, the Hadoop ecosystem is growing to include capabilities previously available only to large scale (MPP) DW platforms.
E2Matrix Jalandhar provides Best Big Data training based on current industry standards that helps attendees to secure placements in their dream jobs at MNCs. E2Matrix Provides Best Big Data Training in Jalandhar Amritsar Ludhiana Phagwara Mohali Chandigarh. E2Matrix is one of the best Big Data training institute offering hands on practical knowledge. At E2Matrix Big Data training is conducted by subject specialist corporate professionals best experience in managing real-time Big Data projects. E2Matrix implements a blend of academic learning and practical sessions to give the student optimum exposure. At E2Matrix’s well-equipped Big Data training Institute aspirants learn the skills for Big Data Overview, Use Cases, Data Analytics Process, Data Preparation, Tools for Data Preparation, Hands on Exercise : Using SQL and NoSql DB's, Hands on Exercise : Usage of Tools, Data Analysis Introduction, Classification, Data Visualization using R, Automation Testing Training on real time projects.
E2Matrix Jalandhar provides Best Big Data training based on current industry standards that helps attendees to secure placements in their dream jobs at MNCs. E2Matrix Provides Best Big Data Training in Jalandhar Amritsar Ludhiana Phagwara Mohali Chandigarh. E2Matrix is one of the best Big Data training institute offering hands on practical knowledge. At E2Matrix Big Data training is conducted by subject specialist corporate professionals best experience in managing real-time Big Data projects. E2Matrix implements a blend of academic learning and practical sessions to give the student optimum exposure. At E2Matrix’s well-equipped Big Data training Institute aspirants learn the skills for Big Data Overview, Use Cases, Data Analytics Process, Data Preparation, Tools for Data Preparation, Hands on Exercise : Using SQL and NoSql DB's, Hands on Exercise : Usage of Tools, Data Analysis Introduction, Classification, Data Visualization using R, Automation Testing Training on real time projects.
E2Matrix Jalandhar provides Best Big Data training based on current industry standards that helps attendees to secure placements in their dream jobs at MNCs. E2Matrix Provides Best Big Data Training in Jalandhar Amritsar Ludhiana Phagwara Mohali Chandigarh. E2Matrix is one of the best Big Data training institute offering hands on practical knowledge. At E2Matrix Big Data training is conducted by subject specialist corporate professionals best experience in managing real-time Big Data projects. E2Matrix implements a blend of academic learning and practical sessions to give the student optimum exposure. At E2Matrix’s well-equipped Big Data training Institute aspirants learn the skills for Big Data Overview, Use Cases, Data Analytics Process, Data Preparation, Tools for Data Preparation, Hands on Exercise : Using SQL and NoSql DB's, Hands on Exercise : Usage of Tools, Data Analysis Introduction, Classification, Data Visualization using R, Automation Testing Training on real time projects.
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
This is Mark Ledbetter's presentation from the September 22, 2014 Hortonworks webinar “What’s Possible with a Modern Data Architecture?” Mark is vice president for industry solutions at Hortonworks. He has more than twenty-five years experience in the software industry with a focus on Retail and supply chain.
BIG Data & Hadoop Applications in Social MediaSkillspeed
Explore the applications of BIG Data & Hadoop in Social Media via Skillspeed.
BIG Data & Hadoop in Social Media is a key differentiator, especially in terms of generating memorable customer experiences.
Herein, we discuss how leading social networks such as Facebook, Twitter, Pinterest, LinkedIN, Instagram & Stumble Upon utilize Hadoop.
To get more details regarding BIG Data & Hadoop, please visit - www.SkillSpeed.com
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
Mike Limcaco, Analytics Specialist / Customer Engineer at Google
Measure trends in a particular topic or search term across Google Search across the US down to the city-level. Integrate these data signals into analytic pipelines to drive product, retail, media (video, audio, digital content) recommendations tailored to your audience segment. We'll discuss how Google unique datasets can be used with Google Cloud smart analytic services to process, enrich and surface the most relevant product or content that matches the ever-changing interests of your local customer segment.
Melinda Thielbar, Data Science Practice Lead and Director of Data Science at Fidelity Investments
From corporations to governments to private individuals, most of the AI community has recognized the growing need to incorporate ethics into the development and maintenance of AI models. Much of the current discussion, though, is meant for leaders and managers. This talk is directed to data scientists, data engineers, ML Ops specialists, and anyone else who is responsible for the hands-on, day-to-day of work building, productionalizing, and maintaining AI models. We'll give a short overview of the business case for why technical AI expertise is critical to developing an AI Ethics strategy. Then we'll discuss the technical problems that cause AI models to behave unethically, how to detect problems at all phases of model development, and the tools and techniques that are available to support technical teams in Ethical AI development.
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
Antje Barth, Principal Developer Advocate, AI/ML at AWS & Chris Fregly, Principal Engineer, AI & ML at AWS
The frequency and severity of natural disasters are increasing. In response, governments, businesses, nonprofits, and international organizations are placing more emphasis on disaster preparedness and response. Many organizations are accelerating their efforts to make their data publicly available for others to use. Repositories such as the Registry of Open Data on AWS and Humanitarian Data Exchange contain troves of data available for use by developers, data scientists, and machine learning practitioners. In this session, see how a community of developers came together though the AWS Disaster Response hackathon to build models to support natural disaster preparedness and response.
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
Sig Narvaez, Executive Solution Architect at MongoDB
MongoDB is now a Developer Data Platform. Come learn what�s new in the 6.0 release and Atlas following all the recent announcements made at MongoDB World 2022. Topics will include
- Atlas Search which combines 3 systems into one (database, search engine, and sync mechanisms) letting you focus on your product's differentiation.
- Atlas Data Federation to seamlessly query, transform, and aggregate data from one or more MongoDB Atlas databases, Atlas Data Lake and AWS S3 buckets
- Queryable Encryption lets you run expressive queries on fully randomized encrypted data to meet the most stringent security requirements
- Relational Migrator which analyzes your existing relational schemas and helps you design a new MongoDB schema.
- And more!
Data Con LA 2022 - Real world consumer segmentationData Con LA
Jaysen Gillespie, Head of Analytics and Data Science at RTB House
1. Shopkick has over 30M downloads, but the userbase is very heterogeneous. Anecdotal evidence indicated a wide variety of users for whom the app holds long-term appeal.
2. Marketing and other teams challenged Analytics to get beyond basic summary statistics and develop a holistic segmentation of the userbase.
3. Shopkick's data science team used SQL and python to gather data, clean data, and then perform a data-driven segmentation using a k-means algorithm.
4. Interpreting the results is more work -- and more fun -- than running the algo itself. We'll discuss how we transform from ""segment 1"", ""segment 2"", etc. to something that non-analytics users (Marketing, Operations, etc.) could actually benefit from.
5. So what? How did team across Shopkick change their approach given what Analytics had discovered.
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
Ravi Pillala, Chief Data Architect & Distinguished Engineer at Intuit
TurboTax is one of the well known consumer software brand which at its peak serves 385K+ concurrent users. In this session, We start with looking at how user behavioral data & tax domain events are captured in real time using the event bus and analyzed to drive real time personalization with various TurboTax data pipelines. We will also look at solutions performing analytics which make use of these events, with the help of Kafka, Apache Flink, Apache Beam, Spark, Amazon S3, Amazon EMR, Redshift, Athena and Amazon lambda functions. Finally, we look at how SageMaker is used to create the TurboTax model to predict if a customer is at risk or needs help.
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
George Mansoor, Chief Information Systems Officer at California State University
Overview of the CSU Data Architecture on moving on-prem ERP data to the AWS Cloud at scale using Delphix for Data Replication/Virtualization and AWS Data Migration Service (DMS) for data extracts
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
Anand Ranganathan, Chief AI Officer at Unscrambl
Conversational AI is getting more and more widely used for customer support and employee support use-cases. In this session, I'm going to talk about how it can be extended for data analysis and data science use-cases ... i.e., how users can interact with a bot to ask analytical questions on data in relational databases.
This allows users to explore complex datasets using a combination of text and voice questions, in natural language, and then get back results in a combination of natural language and visualizations. Furthermore, it allows collaborative exploration of data by a group of users in a channel in platforms like Microsoft Teams, Slack or Google Chat.
For example, a group of users in a channel can ask questions to a bot in plain English like ""How many cases of Covid were there in the last 2 months by state and gender"" or ""Why did the number of deaths from Covid increase in May 2022"", and jointly look at the results that come back. This facilitates data awareness, data-driven collaboration and joint decision making among teams in enterprises and outside.
In this talk, I'll describe how we can bring together various features including natural-language understanding, NL-to-SQL translation, dialog management, data story-telling, semantic modeling of data and augmented analytics to facilitate collaborate exploration of data using conversational AI.
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
Anil Inamdar, VP & Head of Data Solutions at Instaclustr
The most modernized enterprises utilize polyglot architecture, applying the best-suited database technologies to each of their organization's particular use cases. To successfully implement such an architecture, though, you need a thorough knowledge of the expansive NoSQL data technologies now available.
Attendees of this Data Con LA presentation will come away with:
-- A solid understanding of the decision-making process that should go into vetting NoSQL technologies and how to plan out their data modernization initiatives and migrations.
-- They will learn the types of functionality that best match the strengths of NoSQL key-value stores, graph databases, columnar databases, document-type databases, time-series databases, and more.
-- Attendees will also understand how to navigate database technology licensing concerns, and to recognize the types of vendors they'll encounter across the NoSQL ecosystem. This includes sniffing out open-core vendors that may advertise as “open source,"" but are driven by a business model that hinges on achieving proprietary lock-in.
-- Attendees will also learn to determine if vendors offer open-code solutions that apply restrictive licensing, or if they support true open source technologies like Hadoop, Cassandra, Kafka, OpenSearch, Redis, Spark, and many more that offer total portability and true freedom of use.
Data Con LA 2022 - Intro to Data ScienceData Con LA
Zia Khan, Computer Systems Analyst and Data Scientist at LearningFuze
Data Science tutorial is designed for people who are new to Data Science. This is a beginner level session so no prior coding or technical knowledge is required. Just bring your laptop with WiFi capability. The session starts with a review of what is data science, the amount of data we generate and how companies are using that data to get insight. We will pick a business use case, define the data science process, followed by hands-on lab using python and Jupyter notebook. During the hands-on portion we will work with pandas, numpy, matplotlib and sklearn modules and use a machine learning algorithm to approach the business use case.
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
Mariana Danilovic, Managing Director at Infiom, LLC
We will address:
(1) Community creation and engagement using tokens and NFTs
(2) Organization of DAO structures and ways to incentivize Web3 communities
(3) DeFi business models applied to Web3 ventures
(4) Why Metaverse matters for new entertainment and community engagement models.
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
Curtis ODell, Global Director Data Integrity at Tricentis
Join me to learn about a new end-to-end data testing approach designed for modern data pipelines that fills dangerous gaps left by traditional data management tools—one designed to handle structured and unstructured data from any source. You'll hear how you can use unique automation technology to reach up to 90 percent test coverage rates and deliver trustworthy analytical and operational data at scale. Several real world use cases from major banks/finance, insurance, health analytics, and Snowflake examples will be presented.
Key Learning Objective
1. Data journeys are complex and you have to ensure integrity of the data end to end across this journey from source to end reporting for compliance
2. Data Management tools do not test data, they profile and monitor at best, and leave serious gaps in your data testing coverage
3. Automation with integration to DevOps and DataOps' CI/CD processes are key to solving this.
4. How this approach has impact in your vertical
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
Arif Ansari, Professor at University of Southern California
Super Bowl Ad cost $7 million and each year a few Super Bowl ads go viral. The traditional A/B testing does not predict virality. Some highly shared ones reach over 60 million organic views, which can be more valuable than views on TV. Not only are these voluntary, but they are typically without distraction, and win viewer engagement in the form of likes, comments, or shares. A Super Bowl ad that wins 69 million views on YouTube (e.g., Alexa Mind Reader) costs less than 10 cents per quality view! However, the challenge is triggering virality. We developed a method to predict virality and engineer virality into Ads.
1. Prof. Gerard J. Tellis and co-authors recommended that advertisers use YouTube to tease, test, and tweak (TTT) their ads to maximize sharing and viewing. 2022 saw that maxim put into practice.
2. We developed viral Ads prediction using two scientific models:
a. Prof. Gerard Tellis et al.'s model for viral prediction
b. Deep Learning viral prediction using social media effect
3. The model was able to identify all the top 15 Viral Ads it performed better than the traditional agencies.
4. New proposed method is Tease, Test, Tweak, Target and Spots Ad.
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
Jai Bansal, Senior Manager, Data Science at Aetna
This talk describes an internal data product called Member Embeddings that facilitates modeling of member medical journeys with machine learning.
Medical claims are the key data source we use to understand health journeys at Aetna. Claims are the data artifacts that result from our members' interactions with the healthcare system. Claims contain data like the amount the provider billed, the place of service, and provider specialty. The primary medical information in a claim is represented in codes that indicate the diagnoses, procedures, or drugs for which a member was billed. These codes give us a semi-structured view into the medical reason for each claim and so contain rich information about members' health journeys. However, since the codes themselves are categorical and high-dimensional (10K cardinality), it's challenging to extract insight or predictive power directly from the raw codes on a claim.
To transform claim codes into a more useful format for machine learning, we turned to the concept of embeddings. Word embeddings are widely used in natural language processing to provide numeric vector representations of individual words.
We use a similar approach with our claims data. We treat each claim code as a word or token and use embedding algorithms to learn lower-dimensional vector representations that preserve the original high-dimensional semantic meaning.
This process converts the categorical features into dense numeric representations. In our case, we use sequences of anonymized member claim diagnosis, procedure, and drug codes as training data. We tested a variety of algorithms to learn embeddings for each type of claim code.
We found that the trained embeddings showed relationships between codes that were reasonable from the point of view of subject matter experts. In addition, using the embeddings to predict future healthcare-related events outperformed other basic features, making this tool an easy way to improve predictive model performance and save data scientist time.
Data Con LA 2022 - Data Streaming with KafkaData Con LA
Jie Chen, Manager Advisory, KPMG
Data is the new oil. However, many organizations have fragmented data in siloed line of businesses. In this topic, we will focus on identifying the legacy patterns and their limitations and introducing the new patterns packed by Kafka's core design ideas. The goal is to tirelessly pursue better solutions for organizations to overcome the bottleneck in data pipelines and modernize the digital assets for ready to scale their businesses. In summary, we will walk through three uses cases, recommend Dos and Donts, Take aways for Data Engineers, Data Scientist, Data architect in developing forefront data oriented skills.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
Hadoop Innovation Summit 2014
1. THE FUTURE OF
HADOOP: CHOOSING
THE RIGHT OPTIONS
Subash D’Souza
Hadoop Innovation Summit
2014
2. WHO AM I?
Recognized as a Champion of Big Data by Cloudera
Co-Organizer - Los Angeles Hadoop User Group
Organizer - Los Angeles HBase User Group
Organizer – Los Angeles Big Data Users Group
Organizer - Big Data Camp LA
Speaker – Big Data Camp LA 2013
Leading a BOF Session at Hadoop Summit Europe 2014
Author – HBase Developer’s Cookbook (Out Fall 2014)
Technical Reviewer – Apache Flume: Distributed Log Collection for Hadoop
3. HADOOP: OLD & NEW
Hadoop first released in 2006.
Based on the GFS and MapReduce papers released by Google
Ever since adoption has been massive and rapid
Companies like Facebook, Netflix, EBay, Yahoo, Expedia, Spotify and even the
Social Security Administration are adopting Hadoop
Hadoop 2.0 AKA YARN went GA in September of 2013
Is backwards compatible with Hadoop 1.0 API’s
Replaced Jobtracker and Tasktrackers with Application Master, Resource Manager
and Node Managers
4. A BRIEF HISTORY
Google
releases GFS
paper
2002
2003
Google
releases
MapReduce
paper
2004
Nutch adds
distributed
file system
Doug Cutting
launches
Nutch project
MapR
founded
2005
Hortonworks
founded
Cloudera
founded
2006
2007
Hadoop spun
out of Nutch
project at
Yahoo
MapReduce
implemented
in Nutch
Stinger/ Tez
to be
released
Hadoop 2.0
w/HA
available
2008
2009
2010
2011
Hadoop
breaks
Terasort
world record
2012
2013
2014
YARN goes
GA
HBase, Zookee
per, Flume and
more added to
CDH
Impala
(SQL on
Hadoop)
launched
5. PREVIOUSLY, THE STATE OF
DATA
As a data analyst, previously, you were not able to
ask questions you wanted to ask because you did
not have the data points available
Corollary, you couldn’t think of questions to ask of
your data because you didn’t know you had access
to those data points
7. FOCUS
No standard way to get to the data
This is a plus and minus, plus because there is variety to choose from, minus because the
no. of tools to pull the data is huge and evermore expanding
As a company what do you choose?
What do you focus on?
Question – Do you replace your current data
infrastructure or do you augment it?
16. CHOICES
Hortonworks – Completely Open Source – Everything on their platform is available
from Apache Hadoop Distribution. Available as a free download or with paid
support.
Cloudera – Offers the open source Apache Hadoop Distribution as well as
management tools built for the Cloudera Distribution. Available as a free download
or with paid support with the additional tools
MapR – Offers a version of Hadoop that replaces the HDFS with a proprietary
MFS(MapR File System). Everything else on their stack is based on the open
source Apache distribution. Offers a free M3 version along with paid M5 and M7
versions.
17. ADVANTAGES OF YARN
Ability to handle multi tenant clients, i.e. running
multiple
applications
atop
the
same
framework(multi-tenancy)
Splits the work of Job tracker into Resource
Manager and Application master so Job tracker
does not have to allocate resources as well as
manage the tasks
Ability to restart Jobs from the place where they
failed
Scales well beyond the limitations of MR1(4000
22. SQL ON HADOOP VS.
TRADITIONAL RDBMS
Data on Hadoop is not as responsive as a RDBMS
Data in Hadoop can scale much better than an
RDBMS
Data in Hadoop can be accessed using a variety of
mechanisms such as Hive, Imapala, Drill, etc. i.e.
the query engines are abstracted from the
Hadoop(HDFS) storage layer. The same cannot be
said of RDBMS where you would need between
one system to another example, Oracle cannot pull
from SQL Server and vice versa
23. QUESTION?
Do we augment or replace our current data
infrastructure?
Answer – Augment
Why? – combine the best of both worlds, use
aggregated data in your data stores and all the
detail data and lifetime in Hadoop
Of course, you will different SLA’s based on the
query you ask.
25. STARTUPS VS. MATURE
Startups that are in data should make the
consideration of going with YARN to gain the
advantages of YARN
Mature companies tend to be conservative and
hence will look to the more established use cases of
MR1
Startups and Mature companies should look at the
advantages of YARN as well as applying more near
real-time sql-on-hadoop
26. GETTING STARTED WITH
HADOOP VS. ESTABLISHED
HADOOP PRACTICES
Getting started with Hadoop – Opportunity to get off
the ground running YARN plus bleeding edge
technologies.
Established companies with a Hadoop practice tend
to be conservative but that shouldn’t prevent them
from coming with a migration plan to YARN
27. REAL TIME ANALYTICS
Kiji
HBase
Storm
Shark
Redshift
Impala
Stinger
Drill
Accumolo
Presto
Hawq
IBM BigSQL
32. FUTURE OF HADOOP: YARN &
NEAR REAL TIME SQL-ONHADOOP
Multi Tenancy
HA(High Availability)
Tools for SQL-On-Hadoop
Impala
Stinger/Tez
Drill
Shark
33. WHAT DO YOU CHOOSE?
The choices are huge
The toolsets are varied
First focus on the problems you are trying to solve. Don’t
choose Hadoop because it is the latest buzz word. Make
sure there is a real need to solve
Focus on developers and administrators and ensure that
whatever toolset you choose, they have the relevant
skillset or training will be provided or relevant resources will
be brought in from outside( whether through hiring or
consulting)
REMEMBER PROBLEMSET!!! i.e what you are trying to
34. CAVEATS
Work still being done on bringing real time sql-onhadoop to YARN.
Impala has Llama for this.
Stinger for Hive Preview is currently available
HBase on YARN(HOYA) is also actively being
worked on.
Since YARN is a low level API, some abstraction is
needed which is available with tools such as Samza
and Weave
35. BIG DATA = BIG IMPACT
Ken Rudin, Director of Analytics, Facebook
“You need to go the last mile and evangelize your
insights so that people actually act on them and
there is impact."
“It doesn’t matter how brilliant our analyses are. If
nothing changes we have made no impact”
36. GIVING BACK
Hadoop is an open source project
Work done on this and the ecosystem tools are by
committers and contributors, some of whom do this in
their own personal time, in reporting and fixing bugs as
well as new functionality.
Please
give
back
either
by
becoming
a
contributor(Testing, filing bugs) or getting out your use
case for Hadoop(at meetups and/or conferences such
as this one) so others can make use of the issues you
have faced as well see the rapid adoption of the