adactio / collective

There are thirty-eight people in adactio’s collective.

Huffduffed (4836)

  1. Community Pulse - Episode 17 - The Importance of Interaction

    In this episode, Mary, Jason, and PJ sit down with the illustrious Anil Dash of Fog Creek Software to discuss interactions with people, why DevRel is so important yet misunderstood, and Anil’s cool new project,

    links: How to get rid of the assholes: Glitch: Death Note: Humanification: Monument Valley 2: Cowboy Bebop: The Complacent Class… by Tyler Cowen: Tyler Cowen’s Blog: The Human Utility:

    Original video:
    Downloaded by on Mon, 12 Jun 2017 20:46:26 GMT Available for 30 days after download

    —Huffduffed by mathowie

  2. The Way Station

    Colin Devroe

    December 4, 2013This week’s guest on The Way Station is Colin Devroe. Colin is a longtime web entrepreneur who has cofounded Plain, the creators of Barley CMS and Barley for Wordpress. We talk about technology and creating products that solve problems.Links:PlainColin on TwitterColin on ADNFavorite Things:Colin’s favorite things: Barley, Twitter Connect & Discover, and IFTTTNoah’s favorite thing: Square CashCategory: Development

    Tags: Software

    —Huffduffed by cdevroe

  3. Creating large training data sets quickly

    The O’Reilly Data Show Podcast: Alex Ratner on why weak supervision is the key to unlocking dark data.In this episode of the Data Show, I spoke with Alex Ratner, a graduate student at Stanford and a member of Christopher Ré’s Hazy research group. Training data has always been important in building machine learning algorithms, and the rise of data-hungry deep learning models has heightened the need for labeled data sets. In fact, the challenge of creating training data is ongoing for many companies; specific applications change over time, and what were gold standard data sets may no longer apply to changing situations.Ré and his collaborators proposed a framework for quickly building large training data sets. In essence, they observed that high-quality models can be constructed from noisy training data. Some of these ideas were discussed in a previous episode featuring Mike Cafarella (jump to minute 24:16 for a description of an earlier project called DeepDive).

    By developing a framework for mining low-quality sources in order to build high-quality machine learning models, Ré and his collaborators help researchers extract information previously hidden in unstructured data sources (so-called “dark data” buried in text, images, charts, and so on).

    Here are some highlights from my conversation with Ratner:

    Weak supervision and transfer learning

    Weak supervision is a term that people have used before, especially around Stanford, to talk about methods where we have lower-quality training data, or noisier training data. … At a high level, machine learning models are meant to be robust to some noise in the distribution they’re trained on. … One of the really important trends we’ve seen is that more people than ever are using deep learning models. Deep learning models can automate the feature engineering process, but they are more complex and they need more training data to fit to their parameters.

    If you look at the very remarkable, empirical successes that deep learning has had over the last few years, they have been mostly (or almost entirely) predicated on these large label training sets that took years to create. … Our motivation with weak supervision is really: how do we weaken this bottleneck? … For weak supervision, our ultimate goal is to make it easier for the human to provide supervision to the model. That’s where the human comes into the loop. This might be an iterative process.

    … In the standard transfer learning paradigm, you’d take one nicely collecting training set, and you’d train your model on that in the standard way. Then you just try to apply your model to a new data distribution.

    Data programming

    Data programming is a general, flexible framework for using weak supervision to train some end model that you want to train without necessarily having any hand-labeled training data. The basic way it works is, we actually have two modeling stages in this pipeline. The first is that we get input from the domain expert or user in the form of what we call labeling functions. Think of them as Python functions. … The user writes a bunch of labeling functions, which are just black box functions that take in a data point, take in one of these objects, and output a label, or they could abstain. These labeling functions can encode all the types of weak supervision, like distant supervision, or crowd labels, or various heuristics. There’s a lot of flexibility because we don’t make any assumptions about what is inside them.

    In our first modeling stage, we use a generative model to learn which of the labeling functions are more or less accurate by observing where they overlap, where they agree and disagree. Intuitively, if we have 20 labeling functions from a user and we see that one labeling function is always agreeing with its co-labelers on various data points, we think we should trust it. When a labeling function is always disagreeing in a minority, then we downweight this. Basically, we learn this model that tells us how to weight the difference labeling functions the user has provided. Then, the output of this model is a set of probabilistic training labels.

    Then we feed these into the end model we’re trying to train. To give you some intuition on the probabilistic labels: all we’re basically saying is that we want the end model to learn more from data points that got a lot of high confidence votes, rather than the ones that were sort of in contention, from the labeling functions that the user provided. … One goal is to generate data, but often our ultimate goal is to train some end discriminative model, say to do image classification.

    … Snorkel is a system for using this data programming technique to quickly generate training data. A lot of the tooling and the use cases that are publicly part of Snorkel right now are around text extraction.

    Data Programming in Snorkel. Slide from Alex Ratner, used with permission.

    Related resources:

    From search to distributed computing to large-scale information extraction: a conversation with Mike Cafarella (jump to minute 24:16 for a description of an earlier project called DeepDive)

    Data preparation in the age of deep learning

    Adam Marcus: Building human-assisted AI applications

    —Huffduffed by globalmoxie

  4. Together then Alone: A Wellfleet Artist Grieving and Reinventing a Life | WCAI

    Bob Henry and Selina Trieff came to the Cape to paint in the 1950s. For six decades, they thrived in love and art. Bob’s work has always evoked a world

    —Huffduffed by stan

  5. Unpleasant Design & Hostile Urban Architecture - 99% Invisible

    —Huffduffed by globalmoxie

  6. May 26, 2017: Investigative reporting on frequent lottery winners | Member Supported Public Television, Radio |WCNY

    We looked at journalistic investigations into lottery winners who have won hundreds, in not thousands, of times. Susan spoke with Investigative journalist Jeff

    —Huffduffed by stan

  7. Game On - Visiting Improbable 230517

    Site visit to world builders, Improbable

    —Huffduffed by iamdanw

  8. Episode 12: Linda Eliasen

    In our latest Overtime episode, Dan chats with Linda Eliasen—a designer, illustrator, art director, and all-around creative. Linda currently freelances in NYC, but before that, she worked at Ueno, Dropbox, Mailchimp, and Squarespace.

    Linda Eliasen Art Director, Designer, Illustrator In this episode, Linda walks us through her illustration workflow and shares her process for creating production-ready work with the Apple Pencil and iPad Pro. In addition, you’ll learn about Iceland’s terrifying Yule Cat. She also shares the story behind Dropbox’s recruiting video starring puppets. Lastly, Linda talks about her recent move to New York to try something new—improv. “It’s awesome to be at the stage in my career, where I’ve been doing what I’ve been doing for 9 years. To start over—it feels like being a brand new baby junior designer.”

    —Huffduffed by briansuda

  9. Logistics - Violence, Empire And Resistance

    Original video:
    Downloaded by on Fri, 26 May 2017 17:12:16 GMT Available for 30 days after download

    —Huffduffed by iamdanw

  10. Episode 293: THE ENTREPRENEURS Making sense of the world

    Jan Chipchase is the founder of Studio D Radiodurans, a consultancy that’s perhaps like none other in the world. He and his team travel to the far edges of the earth on behalf of clients to immerse themselves in difficult environments and understand human behaviour. He’s recently distilled his years of experience into a beautiful crowdfunded guide called ‘The Field Study Handbook’. This week Jan shares lessons for travelling anywhere, making sense of the world and making a difference.

    —Huffduffed by briansuda

Page 2 of 484