Algorithms make important decisions that affect our lives, from how we are policed to what advertisements we see online, and yet the datasets on which they’re built are inconsistent, unrepresentative, and not always appropriately vetted for their use. The problem with bad outcomes isn’t always the machine or the algorithm - it is often the health of the data itself. In an effort to address this problem, there are many initiatives around enabling ‘healthier’ datasets, borrowing from various standards paradigms. This panel will bring together experts across academia, standards bodies, and regulation to discuss methods for addressing problematic inputs.
• What information about a dataset should be available and accessible to a data scientist in order to support responsible model building?
• What information would be helpful for other audiences - policymakers, communities, etc - and how might this information differ depending on audience?
• What are some ways to present this information? Can we draw from parallel efforts in domains such as warning labels and nutrition labels?
• How might industry, government, and academia work together to create and support standards for dataset labeling?