The Vera Institute of Justice (Vera) partnered with with Two Sigma’s Data Clinic, a volunteer-based program that leverages employees’ data science expertise, to uncover the factors contributing to continued jail growth in rural areas.
Neighborhood Real Estate Report by Michael Mahoney RealtorMichael Mahoney
Here is the most recent real estate market report on homes in Norwood MA from Michael Mahoney - Realtor
http://www.realtormikemahoney.com/idx/?idx-q-Locations=Norwood
Southwick, MA 01077 - NEIGHBORHOOD REPORT - May 2017Lesley Lambert
Get the current housing statistics, sales history, population information, economic statistics. quality of life and more on the town of Southwick, MA 01077
www.westernmahomes.net
#southwick #westernma #westernmass #realestate
GIS is a discipline that heavily relies on data. In this presentation we highlight all the geospatial data sources for crime mapping.
Visit https://expertwritinghelp.com/gis-assignment-help/ for quality gis assignment aid
Southwick, MA 01077 Real Estate Market Report | November 2021 | Lesley Lamber...Lesley Lambert
Southwick, MA 01077 Real Estate Market Report | November 2021 | Lesley Lambert, Southwick REALTOR
Are you curious about the real estate market in Southwick, MA? This report shows the changes in the market over the past twelve months.
Lesley Lambert, Southwick REALTOR with Park Square Realty
413-575-3611
www.westernmahomes.net
Westfield, MA 01085 Real Estate Market Report January / February 2017Lesley Lambert
Listing inventory is at a low point right now, making it a great time to get your Westfield, MA home on the market! Days on market are down and prices are nosing up. Take a look at the full real estate market report for Westfield, MA created by Lesley Lambert, Westfield REALTOR with Park Square Realty. Find out what homes are selling for in Westfield, MA and see what is currently on the market in Westfield.
www.westernmahomes.net
Westfield, MA 01077 NEIGHBORHOOD REPORT May 2017Lesley Lambert
Get the current housing statistics, sales history, population information, economic statistics. quality of life and more on the City of Westfield, MA 01085
www.westernmahomes.net
#westfield #westernma #westernmass #realestate
Neighborhood Real Estate Report by Michael Mahoney RealtorMichael Mahoney
Here is the most recent real estate market report on homes in Norwood MA from Michael Mahoney - Realtor
http://www.realtormikemahoney.com/idx/?idx-q-Locations=Norwood
Southwick, MA 01077 - NEIGHBORHOOD REPORT - May 2017Lesley Lambert
Get the current housing statistics, sales history, population information, economic statistics. quality of life and more on the town of Southwick, MA 01077
www.westernmahomes.net
#southwick #westernma #westernmass #realestate
GIS is a discipline that heavily relies on data. In this presentation we highlight all the geospatial data sources for crime mapping.
Visit https://expertwritinghelp.com/gis-assignment-help/ for quality gis assignment aid
Southwick, MA 01077 Real Estate Market Report | November 2021 | Lesley Lamber...Lesley Lambert
Southwick, MA 01077 Real Estate Market Report | November 2021 | Lesley Lambert, Southwick REALTOR
Are you curious about the real estate market in Southwick, MA? This report shows the changes in the market over the past twelve months.
Lesley Lambert, Southwick REALTOR with Park Square Realty
413-575-3611
www.westernmahomes.net
Westfield, MA 01085 Real Estate Market Report January / February 2017Lesley Lambert
Listing inventory is at a low point right now, making it a great time to get your Westfield, MA home on the market! Days on market are down and prices are nosing up. Take a look at the full real estate market report for Westfield, MA created by Lesley Lambert, Westfield REALTOR with Park Square Realty. Find out what homes are selling for in Westfield, MA and see what is currently on the market in Westfield.
www.westernmahomes.net
Westfield, MA 01077 NEIGHBORHOOD REPORT May 2017Lesley Lambert
Get the current housing statistics, sales history, population information, economic statistics. quality of life and more on the City of Westfield, MA 01085
www.westernmahomes.net
#westfield #westernma #westernmass #realestate
Presentation from the OECD Roundtable on Equal Access to Justice, Latvia, 2018. For more information see: http://www.oecd.org/gov/equal-access-to-justice-oecd-expert-roundtable-latvia-2018.htm
Chapter 2 Minitab Express You are asked to analyze the da.docxwalterl4
Chapter 2 Minitab Express
You are asked to analyze the data to see how income, the number of active primary care physicians and crime
vary by region of the state. Let’s locate these variables in the dataset:
x Income: there are 2 income variables: C40 Mean Household Income, C41 Median Household Income
x Number of Active Primary Care Physicians: located in C10
x Crime: there are 3 crime variables: C22 Violent Crime, C23 Criminal Cases Superior Court, C24 Crime
Index
x Region of the state: located in C37
You should read the description of each variable so you understand what data being summarized (look in
Course Resources for the County Data Set Definitions File). You will select one income variable and one crime
variable to include in your report along with the active primary care physicians variable.
To analyze the data, you will summarize each variable by region of the state. You will use the Minitab Express
commands: Statistics, Describe, Descriptive Statistics.
In the next window, you’ll select the variable to summarize (income, primary care physicians or crime). I will
use C12 social security beneficiaries. You will also select the Region variable in the Group variable window.
You will need to use this option separately for each variable in order to use the Group variable option.
You can also select the statistics that you want to display. For this assignment, select Mean, Minimum,
Maximum, N nonmissing, N missing. You can also select a Display.
Here is my output:
What do I see in the results? First I notice that the number of counties in each region varies from 21
(mountain) to 49 (coast). For Social security beneficiaries, I see that the plain counties have the highest mean
and highest maximum, mountain counties have the lowest mean. It is interesting to note that although the
coast counties do not have the lowest mean, these counties do have the lowest minimum.
NETRUSH
Background & History
Product list/Marketing/Advertising/Market Share
Financials:
Private Company
Netrush is listed as a private company since their company launch.
In 2016, they has a very successful year for sales, coming in at an estimated 2.79 million.
In 2017, they has a slump in their sales, coming in at an estimated 1.29 million.
In 2018, they had no growth. Their sales for 2018 stayed at an estimated 1.29 million.
Competitors:
“Trend Nation is an e-commerce multi channel retailer that develops,
manufactures, markets and sell unique products to customers”
“We’ll elevate your brand’s ecommerce success and drive rapid
revenue growth on Amazon, Walmart, eBay, and beyond”
Trend Nation and etailz are Rushnation’s biggest competitors. Just like Rushnation, these two companies are focused and experts on e-commerce and focus on selling on amazon and other online platforms.
SWOT Analysis
Innovation & Technology / Philanthropy
.
ExampleSample menusMac & Cheese· A box of mac & cheese· BrBetseyCalderon89
ExampleSample menus
Mac & Cheese
· A box of mac & cheese
· Broccoli
· Potatoes
Vegetarian Chili
· TVP (textured vegetable protein)
· Chili packet
· Onions/ green pepper
· Kidney bean (small can or packet)
Tuna/ Noodles
· Elbow noodles or spiral noodles
· Tuna packets
· Alfredo sauce packet (or other sauce packet)
Tortillas/ Burritos
· Flour tortillas
· Zucchini & squash
· Graded cheese (cheddar and pepper jack)
· Onions/ peppers
· Taco sauce packet
Curry Vegetables
· Potatoes
· Peppers & onions
· Cauliflower
· Curry packets or curry seasoning
Pizza Bagels
· Bagels
· Pizza sauce packets
· Peperoni
· Graded cheese
City of Bravos EMS System
St. Louis Missouri
Table of Contents
Executive Summary (Mission, Vision, Values) 3
Problem Definition 4
Assessment of Critical Factors (Demographics) 5
Intervention Strategy 7
Stake Holders 8
Organization Objectives 9
Budget 11
Annual Budget Overview 11
Year 1 Master Budget 24
SWOT Analysis 25
Administrative Plan 26
Operational Plan 32
System Status Plan 35
1, 3, 5 Year Plan 37
14 Attributes 39
Plan Evaluation 45
Professional Experience 47
References 52
Appendix 56
Executive Summary
Mission:The goal of the City of Bravos’ EMS Bureau is to provide the citizens and visitors of the City of Bravos with the highest quality of pre-hospital emergency care possible by employing the highest quality providers, utilizing the best equipment, and incorporating continuous quality improvement as well as providing education for the community.
Vision:The City of Bravos’ EMS Bureau plans to continue growing in order to meet the growing needs of the community it serves as well as implementing a progressive community paramedicine system.
Values:
· Professionalism: Treat all people with dignity, honesty, and respect in order to ensure patient satisfaction and quality care
· Empathy: Treat every patient with the respect as they are the ones experiencing the emergency
· Trust: Building a good rapport and relationship with the community will lead to the best quality care provided
· Safety: Going above and beyond to recognize any safety concerns and act to minimize to ensure quality care as well as patient outcome
Problem Definition
After repainting the old system’s vehicles and introducing new uniforms that were both made with the deep purple dye from the local snails, many of the employees became ill and passed away. It was determined that they used the wrong snail and the dye derived was actually poison, killing off all of the old system’s employees and causing their vehicles to be inoperable. Following this tragic loss, the Bravos’ EMS system proposes a whole new EMS system to not only meet the needs of the public but also exceed them.
Assessment of Critical Factors
Demographicsof St Louis, Missouri (Bravos)
Missouri is one of the largest states in the United States and St. Louis is considered to be the second largest city in Missouri State. St. Louis is located on the Great Plains o ...
PowerPoint Overview - Juvenile Arrests and Neighborhood Characteristics - Pow...stoughne
This is the major project required during the completion of my Graduate Certificate in Geographic Information Sciences.
Please read through it. You will find it interesting as a writing sample and as examples of types of data analysis and research I produce.
The State of Open Data on School BullyingTwo Sigma
How much of a problem is school bullying in NYC? The answer depends on who you ask. Data Clinic volunteers compared local surveys (where many students say bullying is happening) with federal data (where a majority of schools report zero incidents), to analyze these disparities for the 2013-14 school year. To present this work, the Data Clinic hosted an event as part of NYC’s Open Data Week, featuring a presentation of the analysis and a panel discussion with researchers, advocates, and journalists to better understand this important student safety issue.
Halite is an open source artificial intelligence programming competition, created by Two Sigma, where players build bots using the coding language of their choice to battle on a two-dimensional virtual board. Halite II, running on GCP, supported about 6,000 active game players from about 100 countries and 1,000 institutions over a three month period. The presentation surveys the principles needed for a successful AI programming competition and describes the architecture of the game environment, particularly the support that GCP provided for the support of 12 million game executions written in over 20 programming languages. Among other topics, this talk illustrates the approaches taken to security, scalability, and the considerations needed to allow machine learning bots to place in the top 50 results.
More Related Content
Similar to Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incarceration Rates in the U.S.
Presentation from the OECD Roundtable on Equal Access to Justice, Latvia, 2018. For more information see: http://www.oecd.org/gov/equal-access-to-justice-oecd-expert-roundtable-latvia-2018.htm
Chapter 2 Minitab Express You are asked to analyze the da.docxwalterl4
Chapter 2 Minitab Express
You are asked to analyze the data to see how income, the number of active primary care physicians and crime
vary by region of the state. Let’s locate these variables in the dataset:
x Income: there are 2 income variables: C40 Mean Household Income, C41 Median Household Income
x Number of Active Primary Care Physicians: located in C10
x Crime: there are 3 crime variables: C22 Violent Crime, C23 Criminal Cases Superior Court, C24 Crime
Index
x Region of the state: located in C37
You should read the description of each variable so you understand what data being summarized (look in
Course Resources for the County Data Set Definitions File). You will select one income variable and one crime
variable to include in your report along with the active primary care physicians variable.
To analyze the data, you will summarize each variable by region of the state. You will use the Minitab Express
commands: Statistics, Describe, Descriptive Statistics.
In the next window, you’ll select the variable to summarize (income, primary care physicians or crime). I will
use C12 social security beneficiaries. You will also select the Region variable in the Group variable window.
You will need to use this option separately for each variable in order to use the Group variable option.
You can also select the statistics that you want to display. For this assignment, select Mean, Minimum,
Maximum, N nonmissing, N missing. You can also select a Display.
Here is my output:
What do I see in the results? First I notice that the number of counties in each region varies from 21
(mountain) to 49 (coast). For Social security beneficiaries, I see that the plain counties have the highest mean
and highest maximum, mountain counties have the lowest mean. It is interesting to note that although the
coast counties do not have the lowest mean, these counties do have the lowest minimum.
NETRUSH
Background & History
Product list/Marketing/Advertising/Market Share
Financials:
Private Company
Netrush is listed as a private company since their company launch.
In 2016, they has a very successful year for sales, coming in at an estimated 2.79 million.
In 2017, they has a slump in their sales, coming in at an estimated 1.29 million.
In 2018, they had no growth. Their sales for 2018 stayed at an estimated 1.29 million.
Competitors:
“Trend Nation is an e-commerce multi channel retailer that develops,
manufactures, markets and sell unique products to customers”
“We’ll elevate your brand’s ecommerce success and drive rapid
revenue growth on Amazon, Walmart, eBay, and beyond”
Trend Nation and etailz are Rushnation’s biggest competitors. Just like Rushnation, these two companies are focused and experts on e-commerce and focus on selling on amazon and other online platforms.
SWOT Analysis
Innovation & Technology / Philanthropy
.
ExampleSample menusMac & Cheese· A box of mac & cheese· BrBetseyCalderon89
ExampleSample menus
Mac & Cheese
· A box of mac & cheese
· Broccoli
· Potatoes
Vegetarian Chili
· TVP (textured vegetable protein)
· Chili packet
· Onions/ green pepper
· Kidney bean (small can or packet)
Tuna/ Noodles
· Elbow noodles or spiral noodles
· Tuna packets
· Alfredo sauce packet (or other sauce packet)
Tortillas/ Burritos
· Flour tortillas
· Zucchini & squash
· Graded cheese (cheddar and pepper jack)
· Onions/ peppers
· Taco sauce packet
Curry Vegetables
· Potatoes
· Peppers & onions
· Cauliflower
· Curry packets or curry seasoning
Pizza Bagels
· Bagels
· Pizza sauce packets
· Peperoni
· Graded cheese
City of Bravos EMS System
St. Louis Missouri
Table of Contents
Executive Summary (Mission, Vision, Values) 3
Problem Definition 4
Assessment of Critical Factors (Demographics) 5
Intervention Strategy 7
Stake Holders 8
Organization Objectives 9
Budget 11
Annual Budget Overview 11
Year 1 Master Budget 24
SWOT Analysis 25
Administrative Plan 26
Operational Plan 32
System Status Plan 35
1, 3, 5 Year Plan 37
14 Attributes 39
Plan Evaluation 45
Professional Experience 47
References 52
Appendix 56
Executive Summary
Mission:The goal of the City of Bravos’ EMS Bureau is to provide the citizens and visitors of the City of Bravos with the highest quality of pre-hospital emergency care possible by employing the highest quality providers, utilizing the best equipment, and incorporating continuous quality improvement as well as providing education for the community.
Vision:The City of Bravos’ EMS Bureau plans to continue growing in order to meet the growing needs of the community it serves as well as implementing a progressive community paramedicine system.
Values:
· Professionalism: Treat all people with dignity, honesty, and respect in order to ensure patient satisfaction and quality care
· Empathy: Treat every patient with the respect as they are the ones experiencing the emergency
· Trust: Building a good rapport and relationship with the community will lead to the best quality care provided
· Safety: Going above and beyond to recognize any safety concerns and act to minimize to ensure quality care as well as patient outcome
Problem Definition
After repainting the old system’s vehicles and introducing new uniforms that were both made with the deep purple dye from the local snails, many of the employees became ill and passed away. It was determined that they used the wrong snail and the dye derived was actually poison, killing off all of the old system’s employees and causing their vehicles to be inoperable. Following this tragic loss, the Bravos’ EMS system proposes a whole new EMS system to not only meet the needs of the public but also exceed them.
Assessment of Critical Factors
Demographicsof St Louis, Missouri (Bravos)
Missouri is one of the largest states in the United States and St. Louis is considered to be the second largest city in Missouri State. St. Louis is located on the Great Plains o ...
PowerPoint Overview - Juvenile Arrests and Neighborhood Characteristics - Pow...stoughne
This is the major project required during the completion of my Graduate Certificate in Geographic Information Sciences.
Please read through it. You will find it interesting as a writing sample and as examples of types of data analysis and research I produce.
The State of Open Data on School BullyingTwo Sigma
How much of a problem is school bullying in NYC? The answer depends on who you ask. Data Clinic volunteers compared local surveys (where many students say bullying is happening) with federal data (where a majority of schools report zero incidents), to analyze these disparities for the 2013-14 school year. To present this work, the Data Clinic hosted an event as part of NYC’s Open Data Week, featuring a presentation of the analysis and a panel discussion with researchers, advocates, and journalists to better understand this important student safety issue.
Halite is an open source artificial intelligence programming competition, created by Two Sigma, where players build bots using the coding language of their choice to battle on a two-dimensional virtual board. Halite II, running on GCP, supported about 6,000 active game players from about 100 countries and 1,000 institutions over a three month period. The presentation surveys the principles needed for a successful AI programming competition and describes the architecture of the game environment, particularly the support that GCP provided for the support of 12 million game executions written in over 20 programming languages. Among other topics, this talk illustrates the approaches taken to security, scalability, and the considerations needed to allow machine learning bots to place in the top 50 results.
BeakerX is a collection of kernels and extensions to the Jupyter interactive computing platform. Its major features are: 1) JVM kernel support including Java, Scala, Groovy, Clojure, Kotlin, and SQL. The kernels are built from a shared base kernel that includes magics and classpath support. 2) a collection of interactive widgets for time-series plots, tables, and forms. There are APIs for our JVM languages plus Python and JavaScript. 3) prototype autotranslation for polyglot programming 4) One-click publication including interactive widgets, and 5) a data browser with drag-and-drop into the notebook. The presentation will include a demo of BeakerX and discussion of its history and relationship to its predecessor the Beaker Notebook.
Engineering with Open Source - Hyonjee JooTwo Sigma
Engineering systems using open source solutions can be a powerful way to leverage existing technology. However, not all open source solutions are made or supported equally, and it’s important to choose what you use carefully. In this talk, we’ll walk through building a metrics system for a high performance data platform, taking a look at some of the important factors to consider when choosing what open source offerings to use.
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel HudsonTwo Sigma
The NERF and Heads projects bring Linux back to the cloud servers' boot ROMs by replacing nearly all of the vendor firmware with a reproducible built Linux runtime that acts as a fast, flexible, and measured boot loader. It has been years since any modern servers have supported Free Firmware options like LinuxBIOS or coreboot, and as a result server and cloud security has been dependent on unreviewable, closed source, proprietary vendor firmware of questionable quality. With Heads on NERF, we are making it possible to take back control of our systems with Open Source Software from very early in the boot process, helping build a more trustworthy and secure cloud.
Waiter: An Open-Source Distributed Auto-ScalerTwo Sigma
One of the key challenges in developing a service-oriented architecture (SOA) is anticipating traffic patterns and scaling the number of running instances of services to meet demand. In many situations, it’s hard to know how much traffic a service will receive and when that traffic will come. A service may see no requests for several days in a row and then suddenly see thousands of requests per second. If developers underestimate peak traffic, their service can quickly become overwhelmed and unresponsive, and may even crash, resulting in constant human intervention and poor developer productivity. On the other hand, if they provision sufficient capacity upfront, the resources they allocate will be completely wasted when there’s no traffic. In order to allow for better resource utilization, many cluster management platforms provide auto-scaling features. These features tend to auto-scale at the machine/resource level (as opposed to the request level) or by deferring to logic in the application layer. A more optimal approach would be to run services when–and only when–there is traffic. Waiter is a distributed auto-scaler that delivers this optimal type of request-level auto-scaling. It requires no input or handling from applications and is agnostic to underlying cluster managers; it currently uses Mesos, but can easily run on top of Kubernetes or other solutions. Another challenge with SOAs is enabling the evolution of service implementations without breaking downstream customers. On this front, Waiter supports service-versioning for downstream consumers by running multiple, individually-addressable versions of services. It automatically manages service lifecycles and reaps older versions after a period of inactivity. With a variety of unique features, Waiter is a compelling platform for applications across a broad range of industries. Existing web services can run on Waiter without modification as long as they communicate over HTTP and support the transmission of client requests to arbitrary backends. Two Sigma has employed the platform in a variety of critical production contexts for over two years, with use cases rising to hundreds of millions of requests per day.
Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia YeTwo Sigma
Designing a system that can extract immediate insights from large amounts of data in real-time requires a special way of thinking. This talk presents a “reactive” approach to designing real-time, responsive, and scalable data applications that can continuously compute analytics on-the-fly. It also highlights a case study as an example of reactive design in action.
Archival Storage at Two Sigma - Josh LenersTwo Sigma
This talk is about archival storage at Two Sigma. We begin by presenting CelFS, Two Sigma’s geo-distributed file system which has been in deployment for over ten years. Although CelFS has scaled to serve tens of petabytes of data, it uses physical partitioning to provide quality of service guarantees, it has a high replication overhead, and cannot take advantage of outsourced cold storage (e.g., Amazon’s Glaclier or Google’s coldline). In the rest of the talk, we describe our response to these limitations in Jaks, a new storage system to reduce the TCO of CelFS and serve as the backend for other systems at Two Sigma. We also discuss how we hedge risk in changing such a foundational system.
Smooth Storage - A distributed storage system for managing structured time se...Two Sigma
Smooth is a distributed storage system for managing structured time series data at Two Sigma. Smooth’s design emphasizes scale, both in terms of size and aggregate request bandwidth, reliability and storage efficiency. It is optimized for large parallel streaming read/write accesses over provided time ranges. Smooth has a clear separation between the metadata and data layers, and supports multiple pluggable object stores for storing data files. Data can be replicated or moved between different stores and data centers to support availability, performance and storage tiering objectives. Smooth is widely used at Two Sigma by various applications including modeling research workflows, data pipelines and various data analysis jobs. Smooth has been in development for about 5 years, currently stores multiple PBs of compressed data, and serves peak aggregate throughput in excess of 100 GB/s. In this talk I will discuss the design and implementation of Smooth, our experience running it over the past two years, ongoing challenges and future directions.
Whether your data's in MySQL, a NoSQL, or somewhere in the cloud, you're likely paying decent money for storage and IOPS. With ever-growing data volumes, and the need for SSDs to cut latency and replication to provide insurance, your storage footprint is an important place to look for savings. It makes sense, then, why so many storage vendors tout compression as a key metric and differentiator.
The language vendors and users employ to reason about storage footprint and compression is embarrassingly vague if not meaningless or downright deceptive, but we can do better, and we must do better.
Whether your data's in MySQL, a NoSQL, or somewhere in the cloud, you're likely paying decent money for storage and IOPS. With ever-growing data volumes, and the need for SSDs to cut latency and replication to provide insurance, your storage footprint is an important place to look for savings. It makes sense, then, why so many storage vendors tout compression as a key metric and differentiator.
The language vendors and users employ to reason about storage footprint and compression is embarrassingly vague if not meaningless or downright deceptive, but we can do better, and we must do better.
This presentation discusses each part of the durable storage stack, from the hardware on up, and how usage numbers can take on different meanings at each layer. It covers what's important to know at each layer, and how to think about and talk about concepts like compression, fragmentation, write amplification, and wear leveling. Finally, it examines different ways benchmarketers can present data deceptively, and provides some techniques for identifying and cutting through those kinds of misrepresentations.
Identifying Emergent Behaviors in Complex Systems - Jane AdamsTwo Sigma
Forager ants in the Arizona desert have a problem: after leaving the nest, they don’t return until they’ve found food. On the hottest and driest days, this means many ants will die before finding food, let alone before bringing it back to the nest. Honeybees also have a problem: even small deviations from 35ºC in the brood nest can lead to brood death, malformed wings, susceptibility to pesticides, and suboptimal divisions of labor within the hive. All ants in the colony coordinate to minimize the number of forager ants lost while maximizing the amount of food foraged, and all bees in the hive coordinate to keep the brood nest temperature constant in changing environmental temperatures.
The solutions realized by each system are necessarily decentralized and abstract: no single ant or bee coordinates the others, and the solutions must withstand the loss of individual ants and bees and extend to new ants and bees. They focus on simple yet essential features and capabilities of each ant and bee, and use them to great effect. In this sense, they are incredibly elegant.
In this talk, we’ll examine a handful of natural and computer systems to illustrate how to cast system-wide problems into solutions at the individual component level, yielding incredibly simple algorithms for incredibly complex collective behaviors.
Algorithmic Data Science = Theory + PracticeTwo Sigma
Obtaining actionable insights from large datasets requires the use methods that must be, at once, fast, scalable, and statistically sound. This is the field of study of algorithmic data science, a discipline at the border of computer science and statistics. In this talk I outline the fundamental questions that motivate research in this area, present a general framework to formulate many problems in this field, introduce the challenges in balancing theoretical and statistical correctness with practical efficiency, and I show how sampling-based algorithms are extremely effective at striking the correct balance in many situations, giving examples from social network analysis and pattern mining. I will conclude with some research directions and areas for future explorations.
Improving Python and Spark Performance and Interoperability with Apache ArrowTwo Sigma
Apache Arrow-based interconnection between various big data tools (SQL, UDFs, machine learning, big data frameworks, etc.) enables you to use them together seamlessly and efficiently,
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...Two Sigma
The authors present TRIÈST, a suite of one-pass streaming algorithms to compute unbiased, low-variance, high-quality approximations of the global and local number of triangles in a fully-dynamic graph represented as an adversarial stream of edge insertions and deletions.
Graph Summarization with Quality GuaranteesTwo Sigma
Given a large graph, the authors we aim at producing a concise lossy representation (a summary) that can be stored in main memory and used to approximately answer queries about the original graph much faster than by using the exact representation.
An overview of Rademacher Averages, a fundamental concept from statistical learning theory that can be used to derive uniform sample-dependent bounds to the deviation of samples averages from their expectations.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incarceration Rates in the U.S.
1. www.twosigma.com
Exploring the Urban – Rural Incarceration Divide:
Drivers of Local Jail Incarceration Rates in the U.S.
Rachael Weiss Riley
Two Sigma Data Clinic
Bloomberg Data for Good Exchange
September 24, 2017
3. The U.S. has the world’s highest incarceration rate
September 24, 2017 3
4. Rate has gone down slightly in recent years
September 24, 2017 4
5. Mass Incarceration: the whole pie
5
State
Prisons
1.3 m
Federal
Prisons
197 K
Total incarceration
2.3 m
Local Jails
630 K
September 24, 2017
Source: Prison Policy Initiative (2017)
6. Research focuses on state or federal prisons rather than
local county jails
September 24, 2017
0
20,000
40,000
60,000
80,000
100,000
"Prison" "Jail"
Results for in paper title,
from Google Scholar:
6
86,500
8,690
7. Vera launched the Incarceration Trends project to facilitate
county-level jail research
September 24, 2017 7
8. Rural counties have higher local jail rates
September 24, 2017
150
200
250
300
350
2000 2003 2006 2009 2012
Small and Mid Metros
Large Metro, Suburban
Large Metro, Urban
Rural
Local Jail Rate per 100,000 population
8
9. Research questions
September 24, 2017
1. What are the characteristics of a county
that are associated with local jail
incarceration rates?
2. In terms of local jail incarceration rates,
how do counties compare to their peers?
9
10. The following variables were included in the final model:
September 24, 2017
YearUrban
Code
County’s
metro/rural
classification
(4 categories)
2000 to
2013
+ + Other Characteristics
Latinos as percent of total jail population
Blacks as percent of total jail population
Percentage of total jail population awaiting trial
Inmates held for other counties (per 100,000) by county
Inmates held for the state (per 100,000) by county
Jail
The outcome variable is the
local jail population per
100,000 by county
10
11. The following variables were included in the final model:
September 24, 2017
Urban
Code
Year
County’s
metro/rural
classification
(4 categories)
2000 to
2013
+ + Other Characteristics
Hispanics as percent of county population
Non-Hispanic blacks as percent of county population
Percent of county living in poverty
Percent unemployed in county
County’s welfare spending (per 100,000)
County
County’s Police and Corrections spending (per 100,000)
The outcome variable is the
local jail population per
100,000 by county
11
12. The following variables were included in the final model:
September 24, 2017
Urban
Code
Year
County’s
metro/rural
classification
(4 categories)
2000 to
2013
+ + Other Characteristics
Percent of federal prisoners held in local county jails
State’s total prison population (per 100,000)
State
The outcome variable is the
local jail population per
100,000 by county
12
13. 3,145 counties with data from 1970 –
2013, but many had missing data
2,858 counties: 1,783 rural; 690 small/mid
metro; 338 suburban; 47 urban
Dependent variable represents counts Poisson distribution (log-linear model)
Observations not independent
(multiple years for each county leads to
correlation within counties)
Generalized estimating equations (GEE)
model accounts for within-county
correlations
Prior to 2000, different census
classification and lacking covariate data
14 years of data (2000 – 2013)
September 24, 2017
A GEE model was specified to analyze the data
Starting point Implication
13
14. We considered multiple nested models that included different
combinations of variables
September 24, 2017
Urban Code + Year Effects:
Local jail rate ~ urban code + year
Urban Code + Year Effects + Other Characteristics:
Local jail rate ~ urban code + year + all additional variables
14
15. Half of the urban/rural jail rate gap
is explainable by our set of characteristics
September 24, 2017 15
34.6%
31.4%
8.1%
14.8%
11.7%
not significant
0% 10% 20% 30% 40%
Rural vs.
Large Metro, Suburban
Rural vs.
Large Metro, Urban
Rural vs.
Small and Mid Metros
Percent change in local jail rates
(rural vs. metro)
Urban Code + Year
Urban Code + Year + Other
Characteristics
16. County-level poverty
has the strongest association with local jail rates
September 24, 2017
County poverty (%)
County unemployment (%)
Federal inmates held in local jails (%) by state
Non-Hispanic black jail pop. (%)
Total prison pop. (per 100,000) by state
Jail inmates (per 100,000) held for other counties
County welfare spending ($/100,000)
Hispanic jail pop. (%)
County Hispanic pop. (%)
Jail inmates awaiting trial (%)
County police & corrections spending ($/100,000) Not Significant
Significant
Jail inmates (per 100,000) held for the state
County non-Hispanic black population (%)
Percent change in local jail rates
16
17. How do counties compare to their peers?
September 24, 2017
Response residuals = observed – expected rates
Positive: observed (actual) rate is higher than expected
Negative: observed (actual) rate is lower than expected
Residuals were calculated from a separate model that excludes race/ethnicity to
identify under- and over-performing counties
17
18. How do counties compare to their peers?
September 24, 2017 18
Arizona
591
64
Average local jail
rate per county
19. -122
152
Average residuals
per county
How do counties compare to their peers?
September 24, 2017 19
Arizona
Maricopa County, AZ
0
50
100
150
200
250
300
Actual: 228
Expected rate: 114
Nat’l median: 206
21. Residual clusters appear to adhere to state boundaries
suggesting state policies impact jail rates
September 24, 2017
Average residuals
per county -122 152
21
22. What we learned
September 24, 2017
Half of the rural-urban divide can be explained by model covariates
Poverty is important, more so than crime and arrest rates
Residuals reveal local and state geographical clusters
State policies may play an important role in local jail rates
22
23. Thank you
September 24, 2017
dataclinicinfo@twosigma.com
Co-authors:
Jacob Kang-Brown (Vera)
Chris Mulligan (Data Clinic)
Vinod Valsalam (Data Clinic)
Soumyo Chakraborty (Data Clinic)
Christian Henrichson (Vera)
Christine Zhang (Data Clinic)
23